“Hand Sign Detection”
A Major Project Report Submitted to
Rajiv Gandhi Proudyogiki Vishwavidyalaya
Towards Partial Fulfillment for the Award of
Bachelor of Technology in Computer Science
Engineering
Submitted by: Guided by:
Anmol Mayank (0827CS211021) Prof. Bharti Bhattad
Anushka Pateriya Computer Science and
(0827CS211031) Engineering
Arushi Puranik (0827CS211042)
Asmika Jain(0827CS211051)
Avyan Soni (0827CS211053)
Acropolis Institute of Technology & Research, Indore
Jan-Jun 2025
EXAMINER APPROVAL
The Major Project entitled “Hand Sign Detection” submitted by Anmol
Mayank (0827CS211021), Anushka Pateriya (0827CS211031),
Arushi Puranik (0827CS211042), Asmika Jain (0827CS211051),
Avyan Soni (0827CS211053)
has been examined and is hereby approved towards partial fulfilment for
the award of Bachelor of Technology degree in Computer Science
Engineering discipline, for which it has been submitted. It is understood
that by this approval the undersigned do not necessarily endorse or
approve any statement made, opinion expressed, or conclusion drawn
therein, but approve the project only for the purpose for which it has
been submitted.
(Internal Examiner) (External
Examiner)
Date: Date:
I
RECOMMENDATION
This is to certify that the work embodied in this Major project entitled
“Hand Sign Detection” submitted by Anmol Mayank
(0827CS211021), Anushka Pateriya(0827CS211031), Arushi
Puranik (0827CS211042), Asmika Jain (0827CS211051), Avyan
Soni (0827CS211053) is a satisfactory account of the bonafide work
done under the supervision of Prof. Bharti Bhattad, is recommended
towards partial fulfilment for the award of the Bachelor of Technology
(Computer Science Engineering) degree by Rajiv Gandhi Proudyogiki
Vishwavidhyalaya, Bhopal.
(Project Guide)
(Project Coordinator)
(Dean Academics)
II
STUDENTS UNDERTAKING
This is to certify that the Major project entitled “Hand Sign Detection”
has developed by us under the supervision of Prof. Bharti Bhattad.
The whole responsibility of the work done in this project is ours. The
sole intension of this work is only for practical learning and research.
We further declare that to the best of our knowledge; this report does
not contain any part of any work which has been submitted for the
award of any degree either in this University or in any other University
/ Deemed University without proper citation and if the same work
found then we are liable for explanation to this.
Anmol Mayank (0827CS211021),
Anushka Pateriya (0827CS211031),
Arushi Puranik (0827CS211042),
Asmika Jain (0827CS211051),
Avyan Soni (0827CS211053)
III
Acknowledgement
We thank the almighty Lord for giving me the strength and courage to
sail out through the tough and reach on shore safely.
There are number of people without whom this project would not
have been feasible. Their high academic standards and personal
integrity provided me with continuous guidance and support.
We owe a debt of sincere gratitude, deep sense of reverence and
respect to our guide and mentor Prof. Bharti Bhattad, Professor,
AITR, Indore for his motivation, sagacious guidance, constant
encouragement, vigilant supervision, and valuable critical
appreciation throughout this project work, which helped us to
successfully complete the project on time.
We express profound gratitude and heartfelt thanks to Dr Kamal
Kumar Sethi, Professor & Head CSE, AITR Indore for his support,
suggestion, and inspiration for carrying out this project. I am very
much thankful to other faculty and staff members of the department
for providing me all support, help and advice during the project. We
would be failing in our duty if do not acknowledge the support and
guidance received from Dr S.C. Sharma, Director, AITR, Indore
whenever needed. We take opportunity to convey my regards to the
management of Acropolis Institute, Indore for extending academic and
administrative support and providing me all necessary facilities for
project to achieve our objectives.
We are grateful to our parent and family members who have always
loved and supported us unconditionally. To all of them, we want to say
“Thank you”, for being the best family that one could ever have and
without whom none of this would have been possible.
Anmol Mayank(0827CS21102), Anushka
Pateriya(0827CS211031), Arushi Puranik (0827CS211042),
Asmika Jain(0827CS211051), Avyan Soni (0827CS211053)
IV
Executive Summary
This project is submitted to Rajiv Gandhi Proudyogiki
Vishwavidyalaya, Bhopal (MP), India, as part of the partial fulfillment
of the Bachelor of Engineering in Information Technology under the
sagacious guidance and vigilant supervision of Prof. Bharti Bhattad.
Sign language is a critical communication tool for individuals with
hearing and speech impairments, yet its limited understanding among
the general population creates significant social and professional
barriers. This research proposes a real-time hand sign detection
system that leverages computer vision and deep learning to bridge
this gap, enabling seamless interaction between sign language users
and non-users. The system employs Convolutional Neural Networks
(CNNs) for spatial feature extraction and Recurrent Neural Networks
(RNNs) for temporal sequence modeling, achieving accurate
recognition of hand gestures and their conversion into text or speech.
By processing video input at 20–30 frames per second, the system
ensures efficient real-time performance suitable for everyday use.
Preliminary evaluations suggest an accuracy of 85–95% on a
vocabulary of 100–200 signs, with potential scalability to larger
datasets.
V
List of Figures
Figure 1-1: Methodology.......................................................................................................................16
Figure 1-2: Counting of eggs and soda bottle..............................................................................17
Figure 1-3: Counting people at railway stations and airport...............................................17
Figure 3-1: Block Diagram................................................................................................................... 29
Figure 3-2: R-CNN: Regions with CNN Features........................................................................31
Figure 3-3: Fast R-CNN Architecture...............................................................................................31
Figure 3-4: YOLO Architecture...........................................................................................................31
Figure 3-5: Faster CNN Architecture...............................................................................................32
Figure 3-6: Bounding-Box.....................................................................................................................33
Figure 3-7: Data Flow Diagram Level 0..........................................................................................33
Figure 3-8: Data Flow Diagram Level 1..........................................................................................33
Figure 4-1: Deep Learning................................................................................................................... 37
Figure 4-2: Neural Networks.............................................................................................................. 38
Figure 4-3: TensorFlow Architecture..............................................................................................39
Figure 4-4: TensorFlow Working......................................................................................................40
Figure 4-5: Data Structure in JSON Format..................................................................................42
Figure 4-6: Objects in training set....................................................................................................42
Figure 4-7: Instances per Category..................................................................................................43
Figure 4-8: Comparison Graphs.........................................................................................................43
Figure 4-9: Screenshot 1....................................................................................................................... 45
Figure 4-10: Screenshot 2....................................................................................................................45
Figure 4-11: Screenshot 3....................................................................................................................45
VI
List of Tables
Table 1: Hand Sign Gesture
...................................................................................................................................................................
32
Table 2: Testing Table
...................................................................................................................................................................
34
Table 3: Test Cases Table
...................................................................................................................................................................
40
Table 4: Test Case 1
...................................................................................................................................................................
46
Table 5: Test Case 2
...................................................................................................................................................................
47
VII
List of Abbreviation
Abbr1: R- CNN- Regional based Convolutional Neural Networks
Abbr2: OpenCV- Open-Source Computer Vision
Abbr3: GPU- Graphical Processing Unit
VIII
Table of Contents
1.1. Overview..........................................................................................................
1.2. Background and Motivation..............................................................................
1.3. Problem Statement and Objectives..................................................................
1.4. Scope of the Project.........................................................................................
1.5. Team Organization............................................................................................
1.6. Report Structure...............................................................................................
2.1 Preliminary Investigation...................................................................................
2.1.1 Current System.................................................................................................
2.2 Limitations of Current System...........................................................................
2.3 Requirement Identification and Analysis for Project..........................................
2.4 Conclusion.........................................................................................................
3.1 The Proposal....................................................................................................
3.2 Benefits of the Proposed System.....................................................................
3.3 Block Diagram.................................................................................................
3.4 Feasibility Study..............................................................................................
3.4.1 Technical.......................................................................................................
3.4.2 Economical....................................................................................................
3.4.3 Operational...................................................................................................
3.5 Design Representation....................................................................................
3.6 Database Structure............................................................................................
3.7 Deployment Requirements..............................................................................
3.7.1 Software Requirements....................................................................................
3.7.2 Hardware.......................................................................................................
4.1 Technique Used :.............................................................................................
4.2 Tools Used.......................................................................................................
4.2.1 OpenCV:........................................................................................................
4.2.2 Tensor Flow:..................................................................................................
4.3 Language Used................................................................................................
4.4 Screenshots.....................................................................................................
4.5 Testing……..………..
………………………………………………………………………………………..…….29
4.5.1 Strategy Used...................................................................................................
5.1 Conclusion.......................................................................................................
5.2 Limitations of the Work..................................................................................
IX
5.3 Suggestions and Recommendations for Future Work.....................................
Bibliography..........................................................................................................................................
Project Plan............................................................................................................................................
Guide Interaction Sheet.......................................................................................................................
Table 11: Guide Interaction Sheet.....................................................................................................
X
Hand Sign Detection
Chapter 1. Introduction
Introduction
Real-time hand sign detection is a cutting-edge application of computer
vision and artificial intelligence that aims to recognize and interpret hand
gestures in real-time. This technology holds immense potential for a wide
range of practical applications, with one of the most impactful being sign
language recognition for improved communication between sign language
users and the rest of society.
Sign language is a rich and expressive means of communication for the deaf
and hard of hearing community, allowing them to convey complex ideas,
emotions, and information using hand shapes, movements, and facial
expressions. However, the majority of the population does not understand
sign language, leading to communication barriers and exclusion for those
who rely on it [1].
1.1. Overview
A project focused on hand sign language detection involves designing and
implementing a system that can recognize and interpret hand signs and
gestures used in sign language. Below is an overview of the key components
and steps involved in such a project:
Gather a diverse dataset of hand sign language gestures. The dataset should
include various signs, letters, numbers, and common phrases in sign
language.
Annotate each image or video frame in the dataset with the corresponding
sign or gesture label. Perform data preprocessing, including resizing,
normalization, and augmentation to enhance the dataset's diversity and
quality. Record gestures from different viewpoints (frontal, side, tilted) to
make the model adaptable to non-standard hand positions.
Use high-quality cameras to ensure clarity of gesture details, which can later
be down sampled for training if needed.
Page 11 of 60
Hand Sign Detection
1.2. Background and Motivation
Hand sign detection is a crucial area of research in computer vision and
human-computer interaction. It plays a significant role in applications such
as sign language recognition, gesture-based control, virtual reality, and
robotics. The ability of machines to interpret hand gestures enhances
accessibility for people with disabilities and provides an intuitive way to
interact with digital systems.
Recent advancements in deep learning and computer vision have
significantly improved the accuracy and efficiency of hand gesture
recognition. Techniques such as Convolutional Neural Networks (CNNs),
Media pipe, and deep learning-based models like YOLO and Open Pose have
been widely used for detecting and classifying hand gestures. These
technologies enable real-time gesture recognition, making them suitable for
various real-world applications.
This project aims to bridge the communication gap for sign language users by
automating hand sign recognition. Beyond accessibility, it has applications in
gaming, smart homes, and virtual reality.
1.3. Problem Statement and Objectives
The objective of hand sign detection is to develop a robust and accurate
system capable of recognizing and interpreting hand gestures in real- time
from image or video data. This technology serves various applications
including sign language translation, human- computer interaction, virtual
reality, and augmented reality. The primary goal is to create a system that can
identify a wide range of hand signs with high accuracy, regardless of
variations in lighting conditions, background clutter, hand orientation, and
occlusions. Additionally, the system should be efficient enough to run on
various platforms including mobile devices, enabling widespread
deployment and accessibility.
1.3.1. Objective 1: The fundamental goal is to enable effective
communication for individuals with hearing impairments. Hand sign
language detection allows them to convey their thoughts, needs, and
emotions to others who may not understand sign language.
Page 12 of 60
Hand Sign Detection
1.3.2. Objective 2: The technology aims to provide accessibility for people
with hearing impairments in various aspects of life, including
education, employment, healthcare, and social interactions. It ensures
that they can participate fully in society without communication
barriers.
1.3.3. Objective 3: Promote inclusivity by ensuring that individuals with
hearing impairments are not excluded or isolated due to their
communication needs. Hand sign language detection helps create a
more inclusive society where everyone can communicate with ease.
1.3.4. Objective 4: Support the education of individuals with hearing
impairments by providing tools and resources that aid in learning and
communication. Hand sign language detection can be used in
educational settings to facilitate teaching and learning sign language.
1.4. Scope of the Project
The development of a real-time hand sign detection system holds significant
potential for enhancing communication and accessibility for individuals with
hearing impairments. By leveraging machine learning and computer vision,
the system aims to recognize a wide range of hand gestures accurately and
efficiently, enabling seamless interaction through intuitive interfaces. This
technology not only facilitates communication between those who use sign
language and those who do not but also promotes inclusivity across various
environments such as schools, workplaces, healthcare facilities, and public
spaces. It supports sign language education by offering interactive tools and
applications that aid in learning and practicing signs, while also serving as a
foundation for assistive technologies integrated into smart devices like
smartphones and smart glasses. Additionally, it provides valuable support for
sign language interpreters, enhancing their ability to translate between
spoken and signed languages in real-time. Through continuous updates and
user feedback, the system aspires to maintain robustness in diverse
conditions, ensuring reliability, usability, and improved accessibility for all
users.
Page 13 of 60
Hand Sign Detection
1.5. Team Organization
1.5.1. Anmol Mayank: Along with doing preliminary investigation and
understanding the limitations of the current system, I studied the
topic and its scope and surveyed various research papers related to
the project and the technology that is to be used. Documentation is
also a part of the work done by me on this project.
1.5.2. Anushka Pateriya: I investigated and found the right technology and
studied deeply about it. For the implementation of the project, I
collected the dataset. Documentation is also a part of the work done
by me on this project.
1.5.3. Arushi Puranik: I am responsible for leading the technical aspects of
the system, designing the architecture of the fingerprint
authentication mechanism. I ensured that the system is both secure
and scalable, preventing fraud while allowing for smooth and efficient
voter authentication.
1.5.4. Asmika Jain: I handle the design, creation, and maintenance of the
database that stores voter information and election data. This ensures
data is securely stored and accessible in real-time, providing critical
support for election management. Documentation is also a part of the
work done by me in this project.
1.5.5. Avyan Soni: I am responsible for thoroughly testing the system,
identifying potential bugs or vulnerabilities, and ensuring the system
performs reliably under various conditions. Documentation is also a
part of the work done by me in this project.
1.6 Report Structure
The project report is categorized into five chapters.
Chapter 1: Introduction- explains the problem's history before outlining the
project's justification. The project's goals, parameters, and uses
are all covered in this chapter. The chapter also includes
information about the team members and their contributions to
the project's progress. It concludes with a report outline.
Page 14 of 60
Hand Sign Detection
Chapter 2: Review of Literature- explores the work done in the area of
Project undertaken and discusses the limitations of existing
system and highlights the issues and challenges of project area.
The chapter finally ends up with the requirement identification
for present project work based on findings drawn from reviewed
literature and end user interactions.
Chapter 3: Proposed System - starts with the project proposal based on
requirement identified, followed by benefits of the project. The
chapter also illustrate software engineering paradigm used along
with different design representation. The chapter also includes
block diagram and details of major modules of the project.
Chapter also gives insights of different type of feasibility study
carried out for the project undertaken. Later it gives details of
the different deployment requirements for the developed
project.
Chapter 4: Implementation - includes the details of different Technology/
Techniques/ Tools/ Programming Languages used in developing
the Project. The chapter also includes the different user interface
designed in project along with their functionality. Further it
discuss the experiment results along with testing of the project.
The chapter ends with evaluation of project on different
parameters like accuracy and efficiency.
Chapter 5: Conclusion - Concludes with objective wise analysis of results
and limitation of present work which is then followed by
suggestions and recommendations for further improvement.
Page 15 of 60
Hand Sign Detection
Chapter 2. Review of Literature
Review of Literature
The Hand sign detection has been extensively studied in computer vision,
with various approaches ranging from traditional image processing
techniques to deep learning models. Early methods relied on contour
detection and skin segmentation, which were limited by environmental
conditions. Modern approaches use deep learning models like CNNs, RNNs,
and Transformer- based architectures, significantly improving accuracy.
Google’s Media pipe Hands and models like YOLO have shown promising
real-time detection results. Despite advancements, challenges such as hand
occlusion, background variations, and computational efficiency persist.
2.1 Preliminary Investigation
Initial research involved exploring existing hand sign recognition techniques,
datasets, and available frameworks. Popular datasets like the American Sign
Language (ASL) Dataset and Sign Language MNIST were analyzed.
Frameworks like OpenCV, TensorFlow, and Media pipe were evaluated for
their efficiency in real-time detection. The study also involved testing
different model architectures to determine their suitability for the project’s
objectives.
2.1.1 Current System
Existing hand sign detection systems primarily rely on machine
learning and deep learning models. Many implementations use CNNs
and Media pipe for hand landmark detection. These systems can
recognize static gestures but struggle with dynamic gestures and real-
time performance in uncontrolled environments. Most existing
systems require high computational power. Another widely used tool
is Google’s Media pipe, an open-source framework that provides real-
time hand tracking and landmark detection with relatively low latency,
even Achieving high accuracy and responsiveness in real-time
applications remains a challenge, especially on low-powered or edge
devices. High frame-rate processing often requires GPUs or specialized
hardware accelerators like TPUs, which limit deployment in resource-
constrained environments.
Page 16 of 60
Hand Sign Detection
2.2 Limitations of Current System
Despite its promising potential, real-time hand sign detection systems face
several limitations that affect their overall performance and usability. One of
the primary challenges is their dependency on lighting conditions, as poor or
inconsistent lighting can significantly reduce gesture recognition accuracy.
Similarly, complex or dynamic backgrounds can interfere with the system’s
ability to isolate and interpret hand signs correctly. Hand occlusion, where
parts of the hand are hidden from the camera, poses another obstacle by
disrupting the recognition process. Moreover, many existing models are
limited to static gestures, lacking the capability to interpret dynamic or
continuous sign language sequences. Additionally, achieving real-time
performance often demands high computational resources, which can hinder
deployment on low-powered devices like smartphones or wearables.
Addressing these limitations is essential for creating a more robust,
accessible, and widely applicable hand sign detection system.
2.3 Requirement Identification and Analysis for Project
The Software Requirements Specification (SRS) for the hand sign detection
system delineates the essential guidelines and objectives for its development.
Utilizing computer vision and machine learning, the system aims to
accurately interpret hand gestures, fostering improved human-computer
interaction and accessibility. Its scope encompasses diverse applications,
including sign language recognition and gesture-based control interfaces.
This document provides a comprehensive overview of functional and non-
functional requirements, ensuring clarity and alignment among development
teams, stakeholders, and end-users. By delineating system constraints and
intended features, the SRS establishes a robust foundation for the design,
implementation, and validation of the hand sign.
Hand sign detection is a crucial technology with diverse applications in
various industries, including accessibility, healthcare, automotive, and
consumer electronics. This product overview provides an overview of a Hand
Sign Detection System.
Page 17 of 60
Hand Sign Detection
2.3.1. Real-time Detection: Our Hand Sign Detection System is capable of
real-time detection, enabling instant response to hand gestures.
2.3.2. High Accuracy: We use advanced deep learning algorithms to achieve
high accuracy in hand sign detection.
2.3.3. Flexibility and Customization: The system is highly customizable
and adaptable to specific use cases.
2.3.4. Multi-modal Input: The system can process hand signs from various
input sources, including image or video feeds from cameras, depth
sensors, or even data gloves.
2.3.5. User-Friendly Interface: Includes an intuitive graphical user
interface (GUI) for visualizing gesture detection results in real-time
and managing datasets or custom gestures.
2.3.6. Gesture Customization and Training Module: Users can add new
gestures and retrain the model with minimal data, making the system
extensible and personalized for different contexts or local sign
language dialects.
2.3.7. Analytics and Logging: Keeps track of usage patterns, errors, and
success rates for performance tuning and research purposes.
2.4 Conclusion
The review of literature provided a comprehensive understanding of the
existing research, methodologies, and technologies used in hand sign
detection. The Preliminary Investigation explored various deep learning
models, datasets, and frameworks, highlighting the effectiveness of CNNs,
RNNs, and Media pipe for gesture recognition. The Current System analysis
revealed that while existing models achieve good accuracy in controlled
environments, they struggle with real-world conditions such as lighting
variations, background complexity, and hand occlusions.
The Limitations of the Current System emphasized the challenges that need
to be addressed, including computational efficiency, real-time performance,
and adaptability to different conditions. Based on these findings, the
Requirement Identification and Analysis outlined the necessary functional
Page 18 of 60
Hand Sign Detection
and non- functional requirements for building a more robust and efficient
hand sign detection system.
This chapter establishes the foundation for the proposed project by
identifying the gaps in current solutions and defining key objectives to
enhance the accuracy, speed, and usability of the system. The insights gained
will guide the design and development of a practical and scalable hand sign
detection model.
Page 19 of 60
Hand Sign Detection
Chapter 3. Proposed System
Proposed System
3.1 The Proposal
The Hand Sign Detection system is a robust real-time application aimed at
revolutionizing human-computer interaction and promoting inclusivity for
individuals with hearing impairments. This system utilizes advanced
computer vision techniques and deep learning models (CNNs and RNNs) to
accurately interpret and recognize hand gestures. The primary objective is to
bridge the communication gap, enabling seamless interactions across diverse
domains, including accessibility, education, healthcare, and entertainment.
3.2 Benefits of the Proposed System
3.2.1. Enhanced Accessibility: Facilitates interaction between individuals
who rely on sign language and the broader community, breaking
communication barriers.
3.2.2. Education Empowerment: Provides tools for learning and teaching
sign language effectively, fostering inclusivity in educational
environments.
3.2.3. Efficiency: Real-time processing ensures prompt responses, making it
suitable for time-sensitive applications.
3.2.4. Versatility: Supports a wide range of use cases, such as assistive
technologies, virtual environments, and gesture-based controls.
3.2.5. Innovative Integration: Adaptable to various devices like
smartphones, tablets, and smart glasses, ensuring widespread
usability.
Page 20 of 60
Hand Sign Detection
3.3 Block Diagram
Fig 3.3.1 Block Diagram
The block diagram represents a real-time hand sign detection system using a
CNN model (VGG-16). It begins with capturing an input gesture through a
sensor, followed by image preprocessing to enhance quality. Gesture
segmentation isolates the hand region from the background. The segmented
image is then passed through the VGG-16 model for feature extraction and
gesture classification. The final output is the recognized gesture, which can
be used in various applications such as sign language translation or human-
computer interaction.
3.4 Feasibility Study
A feasibility study is an analysis of how successfully a system can be
implemented, accounting for factors that affect it such as economic, technical,
and operational factors to determine its potential positive and negative
outcomes before investing a considerable amount of time and money into it.
Page 21 of 60
Hand Sign Detection
analyze whether the Hand Sign detection can be practically implemented,
considering technical, operational, and economic factors.
3.4.1 Technical
The proposed system is technically robust, employing advanced machine
learning techniques such as Convolutional Neural Networks (CNNs) for
spatial feature extraction and Recurrent Neural Networks (RNNs) with LSTM
cells to capture the temporal dynamics of hand gestures. This combination
enables accurate and reliable recognition, even under challenging conditions
like inconsistent lighting and complex backgrounds. The system supports
real-time processing of video frames with minimal latency, allowing smooth
and natural user interaction. Essential preprocessing techniques, including
resizing, normalization, and data augmentation, improve model performance
and consistency, while additional steps like background subtraction and hand
segmentation further enhance gesture isolation. Designed to be hardware-
compatible, the system works efficiently with standard RGB cameras, depth
sensors, and GPUs, ensuring flexibility across a range of devices. Its scalable
architecture also allows for easy integration of new gestures or sign
languages, making it a future-ready solution for various communication and
accessibility applications.
3.4.2 Economical
The system offers a cost-effective and scalable solution with a strong return
on investment, making it ideal for widespread adoption. By leveraging open-
source machine learning frameworks such as TensorFlow or PyTorch,
development costs are significantly reduced compared to proprietary
alternatives. It operates efficiently on basic hardware like standard webcams
and consumer-grade computers, removing the need for expensive,
specialized sensors. This affordability, combined with minimal deployment
and operational costs using existing infrastructure, enhances its economic
viability. Furthermore, the system has broad market potential across sectors
such as accessibility, education, healthcare, and assistive technologies, with
opportunities for additional funding through government and institutional
grants aimed at promoting inclusivity and accessibility.
Page 22 of 60
Hand Sign Detection
3.4.3 Operational
The system is built with real-world usability and operational efficiency at its
core, featuring a user-friendly interface with intuitive controls that make it
accessible to users of all technical backgrounds. It delivers real-time gesture
recognition, enabling seamless applications such as live sign language
interpretation, smart device control, and interaction with virtual assistants. A
built-in feedback mechanism provides users with immediate responses to
ensure recognition accuracy and enhance user satisfaction, while also
allowing for continuous improvement through iterative learning. Its cross-
platform compatibility across Windows, macOS, Android, and iOS ensures
broad accessibility and ease of deployment. Furthermore, the system
supports ongoing enhancements through regular updates driven by user
feedback and advancements in technology, maintaining its effectiveness and
relevance in evolving environments.
Figure 3-2: OPERATIONAL STUDY
Page 23 of 60
Hand Sign Detection
3.5 Design Representation
Figure 3-3: R-CNN: Regions with CNN Features
Figure 3-4 : Fast R-CNN Architecture
Page 24 of 60
Hand Sign Detection
Figure 3-: YOLO Architecture
Figure 3-6: Faster CNN Architecture
Page 25 of 60
Hand Sign Detection
Figure 3-7 : Bounding-Box
3.5.5 Activity Modelling Diagram
Fig. 3.5.5 Activity Modelling Diagram
The simplified activity of the hand sign detection system begins with
launching the application and capturing the user's hand gesture through a
Page 26 of 60
Hand Sign Detection
camera. The captured image undergoes basic preprocessing, such as resizing
and normalization, followed by hand region segmentation. Key features are
then extracted using a CNN model (VGG-16), and the gesture is classified.
Finally, the recognized gesture is displayed to the user, completing the core
real-time recognition process.
3.5.6 Data Flow Diagram
Fig. 3.5.6 Data Flow Diagram Level-0
Fig. 3.5.7 Data Flow Diagram Level-1
Page 27 of 60
Hand Sign Detection
The simplified activity of the hand sign detection system begins with
launching the application and capturing the user's hand gesture through a
camera. The captured image undergoes basic preprocessing, such as resizing
and normalization, followed by hand region segmentation. Key features are
then extracted using a CNN model (VGG-16), and the gesture is classified.
Finally, the recognized gesture is displayed to the user, completing the core
real-time recognition process.
3.6 Database Structure
The database for the Hand Sign Detection system is a critical component:
Data Composition: Contains annotated datasets representing a diverse
array of gestures, including alphabets, numbers, and common phrases. This
diversity enhances the system's recognition ability across demographic and
cultural variances.
Preprocessing: Techniques like normalization, noise reduction, and data
augmentation ensure the quality and robustness of the dataset.
Extensibility: The database structure is designed to accommodate the
addition of new gestures and regional sign language variations without
requiring major structural changes.
The “Logs” table has the following structure:
Field Data Type Description
Datetime Timestam Shows the Complete date and time when the
p gesture is identified
Type Varchar2 Displays the type of Hand Gestures
CIF Number Count Per Frame. It tells the number of objects in
frame.
Table 1. Database Structure
3.7 Deployment Requirements
There are various requirements (hardware, software, and services) to
successfully deploy the system. Deployment requirements ensure that your hand
sign detection project can move beyond a demo and into real, sustained use—across
Page 28 of 60
Hand Sign Detection
different devices, locations, and users—while remaining fast, secure, scalable, and
reliable.
These are mentioned below :
3.7.1 Software Requirements
[Link] Programming Language: Python, widely used for AI,
computer Vision and prototyping.
[Link] Libraries/Frameworks:
TensorFlow – for building and training deep learning models
(e.g., CNNs, RNNs).
OpenCV – for capturing and processing video frames.
MediaPipe – for efficient hand tracking and gesture detection.
Flask – for developing a web-based interface (optional, if web
UI is used).
[Link] UML Design Tools:
StarUML / Lucidchart / Visual Paradigm / [Link]
Used for drawing Use Case, Class, Sequence, Activity, and
State Diagrams.
[Link] IDE and Code Editors:
PyCharm / Visual Studio Code (VS Code)
For writing, testing, and debugging Python code.
[Link] Version Control:
Git (optional but recommended)
For version control and collaboration.
[Link] Testing Tools:
Manual Testing via interface and output checks
PyTest or unittest module in Python for writing automated
test cases
3.7.2 Hardware Requirements
[Link] Computer System: A capable computer with a modern
processor (dual-core or higher), sufficient RAM (at least 8GB),
and ample storage space (SSD is recommended for faster
performance).
Page 29 of 60
Hand Sign Detection
[Link] Mobile Device: For testing responsiveness and behaviour on
different screen sizes. This helps ensure our application works
well on various devices.
[Link] Data Storage: Adequate storage solutions are needed to store
user data, chat histories, and application-related information.
The storage infrastructure should support data redundancy
and backups to prevent data loss.
[Link] Cameras(RGB or depth Sensors) for accurate gestures capture.
[Link] High- performance GPUs or AI accelerators for computational
efficiency. Storage for datasets and pre- trained models.
[Link] Network Connectivity: Wi-Fi / Ethernet for data sync, cloud
services, or real-time model updates
[Link] Optional: 4G/5G module for remote deployment scenarios
Page 30 of 60
Hand Sign Detection
Chapter 4. Implementation
Implementation
For the problem of detecting and tracking hand movements manually, the system
is designed to automate the process by utilizing a camera that captures real-time
video feed. The feed is processed using Media Pipe’s holistic model to detect and
draw landmarks for the face, hands, and pose. OpenCV is used to display processed
frames, allowing accurate visualization of hand gestures and movements in real
time
4.1 Technique Used :
4.1.1 Media Pipe Holistic Model:
Media Pipe Holistic is a comprehensive model that integrates multiple
models for detecting face, pose, and hand landmarks simultaneously. It
uses advanced deep learning techniques to identify and connect key
points of the hands, making it ideal for applications that require real-
time gesture recognition. The model processes each frame captured by
the camera and converts it into a structured format where landmarks
are extracted and mapped to predefined points.
This ensures high accuracy in detecting hand movements and
maintaining consistent tracking across multiple frames. Media Pipe
Holistic also provides built-in drawing utilities that visualize the
detected landmarks, making it easier to interpret the results. The
model's robustness and ability to handle different lighting conditions
and complex gestures make it a reliable choice for real-time hand
detection and tracking applications.
Its lightweight architecture ensures smooth performance even on low-
power devices, making it suitable for a wide range of applications.
Page 31 of 60
Hand Sign Detection
Figure 4-1 : Media Pipe Holistic
4.1.2 OpenCV for Real-Time Video Processing:
OpenCV (Open-Source Computer Vision Library) is a powerful tool used
for capturing and processing real-time video feeds from a camera. It is
widely used in computer vision applications due to its speed and
flexibility. In this project, OpenCV is responsible for reading frames from
the camera, converting them from BGR to RGB format for compatibility
with Media Pipe, and displaying the processed frames with detected
landmarks.
OpenCV provides efficient frame-by-frame processing, enabling smooth
visualization of real-time hand gestures. It also includes utilities for
resizing, rotating, and enhancing video frames, allowing for additional
preprocessing if required. The integration of OpenCV with Media Pipe
ensures that the system can accurately display and track hand
movements without noticeable delay.
By leveraging OpenCV’s capabilities, the system can handle high frame
rates and ensure that real-time feedback is provided, making it an
essential component.
Page 32 of 60
Hand Sign Detection
Figure 4-2: OpenCV for Video Processing
4.2 Tools Used
4.2.1 OpenCV:
OpenCV (Open-Source Computer Vision Library) is released under a
BSD license and hence it’s free for both academic and commercial use. It
has C++, Python and Java interfaces and supports Windows, Linux, Mac
OS, iOS and Android. OpenCV was designed for computational efficiency
and with a strong focus on real-time applications. Written in optimized
C/C++, the library can take advantage of multi-core processing. Enabled
with OpenCL, it can take advantage of the hardware acceleration of the
underlying heterogeneous computer platform. Adopted all around the
world, OpenCV has more than 47 thousand people in the user
community and an estimated number of downloads exceeding 14
million. Usage ranges from interactive art to mines inspection, stitching
maps on the web or through advanced robotics.
Page 33 of 60
Hand Sign Detection
4.2.2 Tensor Flow:
TensorFlow is an open-source software library for high performance
numerical computation. Its flexible architecture allows easy
deployment of computation across a variety of platforms (CPUs, GPUs,
TPUs), and from desktops to clusters of servers to mobile and edge
devices. Originally developed by researchers and engineers from the
Google Brain team within Google’s AI organization, it comes with strong
support for machine learning and deep learning and the flexible
numerical computation core is used across many other scientific
domains.
Figure 4-3 : TensorFlow Architecture
Page 34 of 60
Hand Sign Detection
Figure 4-4 : TensorFlow Working
4.2.3 Models:
To enhance the accuracy and performance of the hand detection system,
several advanced models can be integrated to complement or replace
existing solutions. YOLO (You Only Look Once) is a highly efficient real-time
object detection model that processes an entire image in one pass, making it
ideal for detecting and recognizing hands or gestures quickly. OpenPose is a
deep learning model that excels in detecting key points of the human body,
face, and hands, making it suitable for multi-person scenarios where high
accuracy in hand landmark detection is required.
SSD (Single Shot MultiBox Detector) is a lightweight model designed for
real- time object detection, capable of detecting hands and gestures
efficiently, making it ideal for mobile and embedded devices. Faster R-CNN
is a high- accuracy object detection model that uses a region proposal
network (RPN) to identify objects with superior precision, making it useful
for applications where accuracy is prioritized over speed. Integrating these
Page 35 of 60
Hand Sign Detection
models can significantly enhance the system’s robustness, flexibility, and
real-time performance.
Below is the list of models:
YOLO (You Only Look Once) Model: YOLO is a highly efficient real-
time object detection model that processes an entire image in a
single pass, making it ideal for detecting and recognizing hands or
gestures quickly. It is known for its speed and accuracy, capable of
identifying multiple objects within a frame. YOLO can be fine-tuned
to detect custom hand gestures, enabling gesture- based applications
with minimal latency.
• Open Pose Model: OpenPose is a deep learning model that detects
key points of the human body, face, and hands. It excels in multi-
person scenarios, making it useful for applications where accurate
tracking of multiple users is required. OpenPose provides detailed
pose estimation and hand landmark detection, making it a suitable
alternative to MediaPipe for scenarios demanding high accuracy in
complex environments.
• SSD (Single Shot MultiBox Detector): SSD is a lightweight object
detection model that divides an image into grids and predicts
bounding boxes and class labels efficiently. It is designed for real-
time applications and performs well on mobile and embedded
devices. SSD can be trained to detect hands and gestures, providing
an additional layer of detection that works well with low- latency
systems.
• Faster R-CNN (Region-Based Convolutional Neural Network): Faster
R-CNN is a high-accuracy object detection model that uses a region
proposal network (RPN) to identify objects in an image. While it is
slower than YOLO and SSD, it provides superior accuracy, making it
ideal for applications where precision is more critical than speed.
Faster R-CNN can be trained to detect hands and specific gestures,
enhancing the robustness of the system.
Page 36 of 60
Hand Sign Detection
4.3 Language Used
4.3.1 Python:
Python is a simple and minimalistic language. Reading a good Python
program feels almost like reading English (but very strict English!). This
pseudo-code nature of Python is one of its greatest strengths. It allows
you to concentrate on the solution to the problem rather than the
syntax i.e. the language itself.
4.3.2 Free and Open source:
Python is an example of a FLOSS (Free/Libre and Open Source
Software). In simple terms, you can freely distribute copies of this
software, read the software's source code, make changes to it, use
pieces of it in new free programs, and that you know you can do these
things. FLOSS is based on the concept of a community which shares
knowledge. This is one of the reasons why Python is so good - it has
been created and improved by a community who just wants to see a
better Python.
4.3.3 Object Oriented:
Python supports procedure-oriented programming as well as object-
oriented programming. In procedure-oriented languages, the program
is built around procedures or functions which are nothing but reusable
pieces of programs. In object-oriented languages, the program is built
around objects which combine data and functionality. Python has a very
powerful but simple way of doing object-oriented programming,
especially when compared to languages like C++ or Java.
Page 37 of 60
Hand Sign Detection
4.4 Screenshots
Figure 4-5 : Screenshot 1
Page 38 of 60
Hand Sign Detection
Figure 4-6: Screenshot 2
Figure 4-7: Screenshot 3
4.5 Testing
4.5.1 Strategy Used
Testing is the process of evaluation of a system to detect differences
between given input and expected output and also to assess the
feature of the system. Testing assesses the quality of the product. It is
a process that is done during the development process.
4.5.2 Test Case and Analysis
Test Case: 01 The test cases for the Hand Sign Detection platform
encompass various aspects of testing, ensuring the platform's functionality,
reliability, and user experience.
Tes Input Expected
t Image Output
Cas
e
Image with clear
1 "Hello"
"hello" sign gesture
Page 39 of 60
Hand Sign Detection
sign
detected
Image with clear
2 “Thank
"thank You Sign
Gesture” you" sign
detected
Image with clear “How are you" sign
3
"How are detected
you" sign
gesture
Image with no hand
4 No hand
present
detected
Tabel :1
Test Case: 02 Test Cases of Unit testing, Functional testing, System testing,
Integration testing, Validation Testing:
Test Case Test Phase Description Expected Actual Status
Result
ID Result
1 Unit Test Compone Component
Testing
individual nt functions Pass
components functions correctl
(e.g., correctly
feature
extraction
algorithm)
2 Functiona Verify Correct Correct Pass
l Testing functional detectio detectio
requirement n of n of
s (Detection "hello" "hello"
of specific sign sign
hand signs)
Page 40 of 60
Hand Sign Detection
3 Functiona Verify Correct Correct Pass
l Testing functional detectio detectio
requirement n of n of
s (detection "thank "thank
of specific you" you"
hand signs) sign sign
4 Functiona Test the Accurate Accurate Pass
l Testing entire detection detection
system with under under
various different different
inputs and lighting lighting
conditions conditio conditio
ns ns
5 System test the Proper Proper Pass
Testing entire handling of handling of
system with occluded occluded
various hands hands
inputs
and
conditions
6 System Test the Hand Hand Pass
Testing integration localization localization
of system integrates integrates
components with with
feature feature
extraction extraction
7 Integrati Test the Feature Feature Pass
on integration extraction extraction
Testing of system integrates integrates
components with with
gesture gesture
recognition recognition
8 Integrati Validate if Users can Users can Pass
Page 41 of 60
Hand Sign Detection
on the system accurately accurately
Testing meets user communica communica
requirement te using te using
s and hand signs hand signs
expectation
s
9 Validatio Validate if System System Pass
n the system provides provides
Testing meets user real time real time
requirement feedback feedback
s and
expectation
s
Page 42 of 60
Hand Sign Detection
Chapter 5. Conclusion
Conclusion
5.1 Conclusion
The Hand Detection project introduces an advanced solution for real-time
hand gesture tracking by integrating Media Pipe’s Holistic Model with
OpenCV for efficient frame processing and visualization. The system
accurately detects and tracks landmarks, enabling seamless gesture
recognition in various applications. By leveraging the robustness of Media
Pipe and the speed of OpenCV, the project ensures high accuracy and low
latency in real-time environments.
In the future, the system can be enhanced by incorporating models such as
YOLO, OpenPose, and Faster R-CNN to improve detection accuracy and enable
multi-hand tracking. This project aims to pave the way for gesture-based
interaction systems, offering innovative solutions for applications in virtual
reality, sign language recognition, and human-computer interaction.
5.2 Limitations of the Work
5.2.1 Limited Gesture Range: The system currently detects only basic hand
landmarks and lacks the capability to recognize complex gestures
accurately.
5.2.2 Lighting Sensitivity: Detection accuracy may decrease in low-light or
highly dynamic lighting conditions, affecting the system’s robustness.
5.2.3 Single-Person Tracking: The model primarily focuses on detecting
landmarks of a single user, making multi-person hand tracking less
efficient.
5.2.4 Hardware Dependency: Real-time processing and visualization
require high-performance hardware, limiting the system’s usability on
low- power devices.
5.3 Suggestions and Recommendations for Future Work
5.3.1 Gesture Classification: Implement machine learning models to
classify and recognize complex hand gestures for more interactive
Page 43 of 60
Hand Sign Detection
applications.
5.3.2 Multi-Person Tracking: Enhance the system to detect and track
hand landmarks for multiple users simultaneously, improving
applicability in collaborative environments.
5.3.3 Model Integration: Incorporate models like YOLO or OpenPose to
increase detection accuracy and support multi-object scenarios.
5.3.4 Lighting Adaptation: Develop adaptive preprocessing techniques to
improve system performance in varying lighting conditions.
5.3.5 Hardware Optimization: Optimize the system for low-power
devices by reducing computational overhead, enabling its use on
mobile and embedded systems.
5.3.6 Cloud-Based Processing: Integrate cloud infrastructure to enable
real- time remote processing, enhancing system scalability and
performance.
Page 44 of 60
Hand Sign Detection
Bibliography
1. Camgoz, N.C.: Sign language transformers: joint end-to-end sign language
recognition andtranslation. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and PatternRecognition, pp. 10023–10033 (2020).
2. Tian, L.: Video big data in smart city: background construction and
optimization forsurveillance video processing. Future Gener. Comput. Syst. 86,
pp. 1371–1382 (2018).
3. Mei, T., Zhang, C.: Deep learning for intelligent video analysis. In:
Proceedings of the 25thACM International Conference on Multimedia, pp. 1955–
1956 (2017).
4. Sarma, D., Bhuyan, M.K.: Methods, databases and recent advancement of
vision based handgesture recognition for HCI systems: a review. SN Comput. Sci.
2(6), 1–40 (2021).
5. Alsaffar, M., et al.: Human-computer interaction using manual hand gestures
in real [Link]. Intell. Neurosci. (2021)
6. Pu, X., & Zhou, X. (2020). Hand sign language recognition using deep
learning for speech-disabled people. Journal of Ambient Intelligence and
Humanized Computing, 11(5), 2269-2278.
DOI: 10.1007/s12652-019-01512- wAhmed, M. S., Sadik, A. S., & Alelaiwi, A.
(2019)
7. N. Adaloglou et al., "A Comprehensive Study on Deep Learning-Based
Methods for Sign Language Recognition," in IEEE Transactions on Multimedia, vol.
24, pp. 1750-1762, 2022, doi: 10.1109/TMM.2021.3070438.
8. Kothadiya, Deep, et al. “Deepsign: Sign Language Detection and Recognition
Using Deep Learning.” Electronics, vol. 11, no. 11, 3 June 2022, p. 1780,
[Link]
9. Z. Pei, Y. Wang and L. Han, "Hand Gesture Recognition with Color
Descriptors," 2020 IEEE 9th Joint International Information Technology and
Artificial Intelligence Conference (ITAIC), Chongqing, China, 2020, pp. 1822- 1826,
doi: 10.1109/ITAIC49862.2020.9338985.
10. R. Golovanov, D. Vorotnev and D. Kalina, "Combining Hand Detection and
Gesture Recognition Algorithms for Minimizing Computational Cost," 2020 22th
Page 45 of 60
Hand Sign Detection
International Conference on Digital Signal Processing and its Applications (DSPA),
Moscow, Russia, 2020, pp. 1-4, doi: 10.1109/DSPA48919.2020.9213273.
11. J. Liu, K. Furusawa, T. Tateyama, Y. Iwamoto and Y. -W. Chen, "An Improved
Hand Gesture Recognition with Two-Stage Convolution Neural Networks Using a
Hand Color Image and its Pseudo-Depth Image," 2019 IEEE International
Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 375-379, doi:
10.1109/ICIP.2019.8802970.
12. R. Kabir, N. Ahmed, N. Roy and M. R. Islam, "A Novel Dynamic Hand Gesture
and Movement Trajectory Recognition model for Non-Touch HRI Interface," 2019
IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin,
Taiwan, 2019, pp. 505-508,
doi: 10.1109/ECICE47484.2019.8942691.
13. J. Zhao, X. H. Li, J. C. D. Cruz, M. S. Verdadero, J. C. Centeno and J. M. Novelero,
"Hand Gesture Recognition Based on Deep Learning," 2023 International
Conference on Digital Applications, Transformation & Economy (ICDATE), Miri,
Sarawak, Malaysia, 2023, pp. 250-254, doi:
10.1109/ICDATE58146.2023.10248500.
14. S. Y. Boulahia, E. Anquetil, F. Multon and R. Kulpa, "Dynamic hand gesture
recognition based on 3D pattern assembled trajectories," 2017 Seventh
International Conference on Image Processing Theory, Tools and Applications
(IPTA), Montreal, QC, Canada, 2017, pp. 1-6, doi: 10.1109/IPTA.2017.8310146.
15. R. A. Bhuiyan, A. K. Tushar, A. Ashiquzzaman, J. Shin and M. R. Islam,
"Reduction of gesture feature dimension for improving the hand gesture
recognition performance of numerical sign language," 2017 20th International
Conference of Computer and Information Technology (ICCIT), Dhaka, Bangladesh,
2017, pp. 1-6, doi: 10.1109/ICCITECHN.2017.8281833.
16. L. Tiantian, S. Jinyuan, L. Runjie and G. Yingying, "Hand gesture recognition
based on improved histograms of oriented gradients," The 27th Chinese Control
and Decision Conference (2015 CCDC), Qingdao, China, 2015, pp. 4211-4215, doi:
10.1109/CCDC.2015.7162670.
17. R. Hartanto, A. Susanto and P. I. Santosa, "Real time hand gesture
movements tracking and recognizing system," 2014 Electrical Power, Electronics,
Communications, Control and Informatics Seminar (EECCIS), Malang, Indonesia,
2014, pp. 137-141, doi: 10.1109/EECCIS.2014.7003734.
18. N. Adaloglou et al., "A Comprehensive Study on Deep Learning-Based
Methods for Sign Language Recognition," in IEEE Transactions on Multimedia, vol.
24, pp. 1750-1762, 2022, doi: 10.1109/TMM.2021.3070438.
Page 46 of 60
Hand Sign Detection
19. “Understanding of LSTM Networks.” GeeksforGeeks, 10 May 2020,
[Link]/understanding-of-lstm-networks/.
20. “MediaPipe Solutions API Reference.” Google for Developers,
[Link]/mediapipe/api/solutions. Accessed 1 Dec. 2023.
[15]“How Is the LSTM RNN Forget Gate Calculated?” Data Science Stack Exchange,
[Link]/questions/32217/how-is-the-lstm- rnn-forget-
gate-calculated.
21. Cheok MJ, Omar Z, Jaward MH. A review of hand gesture and sign language
recognition techniques. Int J Mach Learn Cybern. 2019;10(1):131–53.
22. Griffin, R. W. (2021). Management. Cengage Learning. Hampshire, UK. 11.
23. Chen L, Wang F, Deng H, Ji K. A survey on hand gesture recogni tion. Intern
Conf Comput Sci Appl. 2013;2013:313–6
24. . Cheok MJ, Omar Z, Jaward MH. A review of hand gesture and sign language
recognition techniques. Int J Mach Learn Cybern. 2019;10(1):131–53.
25. Chen,J. CS231A Course Project Final Report Sign Language Recognition with
Unsupervised Feature Learning. 2012. Available online:
[Link]
al/writeup/distributab le/Chen_Paper. pdf (accessed on 15 March 2022
26. Yan, S. Understanding LSTM and Its Diagrams. Available online:
[Link] diagrams-
37e2f46f1714
27. Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput.
1997, 9, 1735–1780
28. Liao, Y.; Xiong, P.; Min, W.; Min, W.; Lu, J. Dynamic Sign Language Recognition
Based on Video Sequence with BLSTM-3D Residual Networks. IEEE Access 2019, 7,
38044–38054.
29. Muthu Mariappan, H.; Gomathi, V. Real-Time Recognition of Indian Sign
Language. In Proceedings of the International Conference on Computational
Intelligence in Data Science, Haryana, India, 6–7 September 2019
Page 47 of 60
Hand Sign Detection
Project Plan
Fig. 5.1: Gantt Chart
Page 48 of 60
Hand Sign Detection
Guide Interaction Sheet
Date Discussion Action Plan
Hand Sign
Discussed about the title of the was decided
28/12/2024 Project
as the title.
20/01/2025 Discussion on the technology to Tensorflow ,
be for Hand detection in real-
used OpenCV
other andwere
tools
time finalized
05/02/2025 Discussion of the creation of Gathering of
synopsis of the project information
synopsis for
creation
Suggestions on how to do a Many research
literature
15/02/2025 survey and preliminary papers
read were
, understood
investigation
on the topic and theirwere to
abstract
be written.
Using TensorFlow
20/02/2025 Discussion on the and other
tools, we decided
implementation
the project of implement
detection.
Decided to
Design Diagrams Include the Deign
06/03/2025 Diagrams in Detail
Train Models, Validate Results Took steps for
16/03/2025 Train Models, Validate Results adding andthe
modifying
program
Final Report, Video Action taken that
Implementation ,a for each
nd Poster
26/03/2025 user an entry must
be made
in the database so
that
can becount
made easy
Decided to write
26/03/2025 Discussion on project the
and content
integrate it in
documentation the
fomatproper
of the report
Page 49 of 60
Hand Sign Detection
Table 11: Guide Interaction Sheet
Page 50 of 60
Hand Sign Detection
Appendix: A
Source Code
[Link]
1. file_backup.ipynb
Importing and Installing dependencies import cv2
import NumPy as np import os
from matplotlib import pyplot as plt import time
import mediapipe as mp
Keypoints using mp_holistic and mp_drawing
# used for bringing the holistic model through .holistic mp_holistic =
[Link] # Holistic model
# used for drawing the utilities - points and structure from the midiapipe which is
its main function through .drawing_utils
mp_drawing = [Link].drawing_utils # Drawing utilities
# creating a function so that I do not have to write the above cell again and agian
# passing two variables image and model . image from the user and the model for
mediapipe utilization
def mediapipe_detection(image, model):
# so opencv reads the image in form of bgr but for detection using mediapipe we
require the format to be RGB
# so [Link] helps in recolouring the image
image = [Link](image, cv2.COLOR_BGR2RGB) # COLOR CONVERSION BGR 2
RGB [Link] = False# Image is no longer writeable this helps us in
saving a bit
of memory
# so here image is going to be a frame from video
results = [Link](image) #Make prediction [Link]
= True # Image is now writeable # so that opencv can
produce results in bgr format
Page 51 of 60
Hand Sign Detection
image = [Link](image, cv2.COLOR_RGB2BGR) #COLOR COVERSION RGB 2
BGR return image, results
def draw_landmarks(image, results):
# so we are passing it into the .draw_landmarks resulting the producing a structure
to the image in the video
# passing the image and results with respect to the lists of the various types of
landmarks like face, left hand or right hand etc
# this will provide us with the items present in the lists with a comprehensive
details for the perticular landmarks section
# mp_holistic is allowing us to pass the image via the connection map for a
perticular landmark
# draw_landmark func does not return the image but rather applies the landmark
visualizations to the current image in place.
mp_drawing.draw_landmarks(image,results.face_landmarks,
mp_holistic.FACEMESH_CONTOURS)
# Draw face connections
mp_drawing.draw_landmarks(image,results.pose_landmarks,
mp_holistic.POSE_CONNECTIONS) # Draw pose connections
mp_drawing.draw_landmarks(image,results.left_hand_landmarks,
mp_holistic.HAND_CONNECTIONS) # Draw left hand connections
mp_drawing.draw_landmarks(image,results.right_hand_landmarks,
mp_holistic.HAND_CONNECTIONS) # Draw right hand connections
def draw_styled_landmarks(image, results):
# Draw face connections
# comes with the mediapipe a helper function mp_drawing Draws the landmarks
and the connections on the image.
mp_drawing.draw_landmarks(image,results.face_landmarks,
mp_holistic.FACEMESH_CONTOURS,
mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1), #
color
the joint
Page 52 of 60
Hand Sign Detection
the connection
mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
#color
# mp_drawing.draw_landmarks(image,
results.face_landmarks,
mp_holistic.FACEMESH_CONTOURS,
# mp_drawing.DrawingSpec(color=(80,110,10),
thickness=1, circle_radius=1), # color the joint
# mp_drawing.DrawingSpec(color=(80,256,121),
thickness=1, circle_radius=1) #color the connection
# )
# Draw pose connections
mp_drawing.draw_landmarks(image, results.pose_landmarks,
mp_holistic.POSE_CONNECTIONS, mp_drawing.DrawingSpec(color=(80,22,10),
thickness=2, circle_radius=4), mp_drawing.DrawingSpec(color=(80,44,121),
thickness=2, circle_radius=2)
)
# Draw left hand connections
mp_drawing.draw_landmarks(image,results.left_hand_landmarks,
mp_holistic.HAND_CONNECTIONS,
mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
)
# Draw right hand connections
mp_drawing.draw_landmarks(image,results.right_hand_landmarks,
mp_holistic.HAND_CONNECTIONS,
mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
)
Page 53 of 60
Hand Sign Detection
mp_holistic.POSE_CONNECTIONS cap = [Link](0)
# Set mediapipe model
# so we are accesing the mediapipe model using the with mp_holistic.Holistic
# so how the mediapipe model works is that it actuallly makes an initial detection
using the min_detection_confidence
# then track the key points with min_tracking_confidence=0.5 we can change it as
well
with mp_holistic.Holistic(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as holistic: while [Link]():
# Read feed
ret, frame = [Link]()
# Make detections
# for entering the function
image, results = mediapipe_detection(frame, holistic) print(results)
# # Draw landmarks
# helps in accessing the draw_landmarks func allowing to draw landmarks through
mediapipe draw_styled_landmarks(image, results)
# Show to screen [Link]('OpenCV Feed', image)
# Break
if [Link](10) & 0xFF == ord('q'):
break [Link]()
[Link]()
[Link]() [Link]()
Extracting keypoint values # results.
# here we have acquired the results parameters from the model like the mediapipe
is effectively working
# as for eg: results.pose_landmarks.landmark[0].visibility
Page 54 of 60
Hand Sign Detection
# which helps in confirming the mediapipe model
# further here we are also conforming the "visibility" parameter of the landmark
from pose through the test array formed
len(results.pose_landmarks.landmark)
# getting 33 because mediapipe offers 33 types of landmarks for the
pose_landmarks func # like nose , ears, shoulders wrists,etc
# pose=[]
# for res in results.pose_landmarks.landmark:
# test=[Link]([res.x,res.y,res.z,[Link]]) #
[Link](test)
# so here we are basically forming an array pose which basically contains the
parameters for the landmarks in the res variable obtained from the results from
the pose_landmark
# again pose_landmarks denotes the func from the mediapipe lib and "landmark"
shows the perticular landmark value out of those 33 parameters
# so we used .flatten to make it compatible for the LSTM model used further
pose=[Link]([[res.x,res.y,res.z,[Link]] for res in
results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else
[Link](33*4)
[Link]
# conforming the shape of the array i.e 1 d
# similarly
# here we donot have the visibility parameter for the hand so ignoring it
lh=[Link]([[res.x,res.y,res.z] for res in
results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks
else [Link](21*3)
# small addition:-
# ERROR HANDLING
# now if the left hand is not present in the frame for that we need to handle the the
case. for that we can form the array of 0's
Page 55 of 60
Hand Sign Detection
# similarly
rh=[Link]([[res.x,res.y,res.z] for res in
results.left_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks
else [Link](21*3)
# similarly for face face_all_parameters=len(results.face_landmarks.landmark)*3
print(face_all_parameters)
face=[Link]([[res.x,res.y,res.z] for res in
results.face_landmarks.landmark]).flatten() if
results.face_landmarks else [Link](468*3)
def extract_keypoints(results):
pose=[Link]([[res.x,res.y,res.z,[Link]] for res in
results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else
[Link](33*4)
lh=[Link]([[res.x,res.y,res.z] for res in
results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks
else [Link](21*3)
rh=[Link]([[res.x,res.y,res.z] for res in
results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks
else [Link](21*3)
face=[Link]([[res.x,res.y,res.z] for res in
results.face_landmarks.landmark]).flatten() if
results.face_landmarks else [Link](468*3)
return [Link]([pose,lh,rh,face])
# concatenating for the model to detect the sign language
extract_keypoints(results).shape Setting up folders for collection
# Path for exported data, numpy arrays DATA_PATH=[Link]('data for different
actions')
# so what here we are going to do is that here the data will be collected and for
that
# 30 number of sequences are taken into consideration which means that 30
videos worth of data
Page 56 of 60
Hand Sign Detection
# for each sequence as well here 30 frames in length are taken into consideration
which means 30*30 data
# again here we have for ex 3 gestures so the data becomes 30*30*3
# again here we have 1662 keypoints for the landmarks obtained earlier as a result
the final data = 30*30*3*1662
# Actions that we try to detect
actions = [Link](['hello', 'thanks', 'iloveyou'])
# Thirty videos worth of data no_of_sequences = 30
# Videos are going to be 30 frames in length sequence_length = 30
# just creating the folders and sub folders
# action and seq in nested loop for forming folders
for action in actions:
for sequence in range(no_of_sequences):
try:
# makedirs used for making the sub directories
[Link]([Link](DATA_PATH, action, str(sequence)))
except:
pass
Collecting keypoint values for Training nd Testing
# Set mediapipe model cap = [Link](0) #
with mp_holistic.Holistic(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as holistic: # NEW LOOP
# Loop through actions
# this loop is for specifications of the number of times the frames need to be saved
for every sequence w.r.t every action
for action in actions:
Page 57 of 60
Hand Sign Detection
# Looping through sequences aka videos for sequence in range(no_of_sequences):
# Looping through video length aka sequence length (mentioned above) - can be
changed for frame_num in range(sequence_length):
# Reading the frame from the video. This frame will be used for further analysis
ret, frame = [Link]()
# Making the detections using the mediapipe_detection func where the model will
be able to process the frame like BGR to RGB
image, results = mediapipe_detection(frame, holistic) # print(results)
# Drawing landmarks on the acquired frame draw_styled_landmarks(image,
results)
# logic is for the formating portion if frame_num == 0:
[Link](image, 'STARTING COLLECTION', (120,200),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0,255, 0), 4, cv2.LINE_AA)
[Link](image, 'Collecting frames for {} Video Number {}'.format(action,
sequence),
# providing the break for adjusting the posture [Link](2000) #2 sec
else:
[Link](image, 'Collecting frames for {} Video Number {}'.format(action,
# NEW Export keypoints
# now for this frame or results obtained from mediapipe_extraction which will be
RGB the keypoints will be extracted from the extract_keypoint func in the form of a
1 d array.
# again in ideal senario the no of extracted points will be 1662 as found keypoints
= extract_keypoints(results)
# providing the path for the save
npy_path = [Link](DATA_PATH, action, str(sequence), str(frame_num)) #
saving the array in the location
[Link](npy_path, keypoints)
# Break for this frame and continue for next respective iteration if [Link](10)
& 0xFF == ord('q'):
break
# when loop ends the window closes [Link]() [Link]()
Page 58 of 60
Hand Sign Detection
[Link]() [Link]()
Preprocessing data and creating labels w.r.t. actions
from sklearn.model_selection import train_test_split from [Link]
import to_categorical
# creating a dict where label is mapped with a num by def starting from 0
# enumerate calling the next label from the actions list , earlier created as the array
label_map = {label:num for num, label in enumerate(actions)}
label_map
# forming two arrays named as mentioned
# just as a graph where seq will contain all the data with respect to all the videos
and frames recorded during the training
# and labels for denoting the actions
# now again this is our main task sequences, labels = [], []
# 3 actions so 3 iterations for action in actions:
# 30 videos with respect to each action so 30 iterations for sequence in
range(no_of_sequences):
# forming a blank array for storing that x data of all the collection done till now
window = []
# for each frame recorded in each of the seq for frame_num in
range(sequence_length):
# so res is basically helping in loading the data for each frame through the os lib
res = [Link]([Link](DATA_PATH, action, str(sequence),
"{}.npy".format(frame_num))) # frame_num shows the exact name to counter in
the loop
# appending the window array with res [Link](res)
# now the loop for the frames for a perticular seq is over
# adding the window data in the sequences array as a 2 d array with one
parameter as each frame and second one as keypoints -1662
[Link](window)
Page 59 of 60
Hand Sign Detection
# appending the labels array only once with only action i.e the action running the
loop
# values are going to be 1d because action is just a label which is currently 0 or 1
or 2 running action * seq times , here 30
# not any data containing 2 d array [Link](label_map[action])
[Link](sequences).shape X = [Link](sequences)
[Link]
# changing the labels from 0,1,2 to categorical data for easier accessebility y =
to_categorical(labels).astype(int)
y
# so spliting the data into train and test with 5 percent of testing
# data contains the seq with frames and keypoints respectively in form of a 3 d
array X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05)
X_test.shape
Building and training LSTM neural network
from [Link] import Sequential from [Link]
import LSTM, Dense from [Link] import TensorBoard
# adding the logs folder log_dir = [Link]('Logs')
# tensorboard is a part of tensorflow monitoring the model training using a web
app # will help to track the accuracy during the training
tb_callback = TensorBoard(log_dir=log_dir) # neural network
# adding sequential API cuz it will allow in building the model fluidly model =
Sequential()
# adding the three layers of LSTM consisting of 3 positional argument and 1
keyword argument # positional arg - depends on the position of the value. wrong
position wrong output
# keyword arg - depends w.r.t the value assigned with the variable
# returning sequence is necessery because here if not then the next lstm layer will
not follow the prev layer
# adding 65 units in first layer and so on . activation is relu
Page 60 of 60
Hand Sign Detection
# input shape is 30,1662 for each video i.e 30 frames and 1662 keypoints
[Link](LSTM(64, return_sequences=True, activation='relu',
input_shape=(30,1662))) [Link](LSTM(128, return_sequences=True,
activation='relu'))
# return seq as false cuz next is dense layer so not required [Link](LSTM(64,
return_sequences=False, activation='relu'))
# adding 64 units for dense layer [Link](Dense(64, activation='relu'))
[Link](Dense(32, activation='relu'))
# actions is having three values so the [Link] of [0] is also 3 in shape
# using softmax so that the values are confined in 0 to 1 the value will sum up and
provide 1 [Link](Dense([Link][0], activation='softmax'))
# eg
eg_res = [.7, 0.2, 0.1] actions[[Link](eg_res)]
# using the adam optimizer
# categorical_crossentropy for multiclasss classification # metrics for evaluation
[Link](optimizer='Adam', loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
[Link](X_train, y_train, epochs=330, callbacks=[tb_callback]) # tensorboard --
logdir=.
[Link]()
8. Making the predictions res=[Link](X_test)
# again the actions with the max value provided by softmax is returned
actions[[Link](res[0])]
actions[[Link](y_test[4])]
Saving weights for future accessability [Link]('action.h5')
Page 61 of 60
Hand Sign Detection
del model model.load_weights('action.h5')
Evaluation using Confusion Matrix and Accuracy score
from [Link] import multilabel_confusion_matrix, accuracy_score yhat =
[Link](X_train)
# so here we are check the results w.r.t the axis - 1 i.e the row no 1 having the
values of actions i.e 3 values
# then converting them in list format and finding the max value ytrue =
[Link](y_train, axis=1).tolist()
# one hot encoding
yhat = [Link](yhat, axis=1).tolist() yhat
# confution matrix multilabel_confusion_matrix(ytrue, yhat)
accuracy_score(ytrue, yhat)
FINAL Testing in real time # for coloring the actions
colors = [(245,117,16), (117,245,16), (16,117,245)]
# results from the model prediction, actions, image from the video, colors from
above def prob_viz(res, actions, input_frame, colors):
output_frame = input_frame.copy()
# so here prob can be obtained from the softmax from earlier - have 3 values for
num, prob in enumerate(res):
# .rectangle for formation of rectangle
# here the 2nd parameter denotes the position of the color where num can be 0, 1,
2 based on the action and changes the y axis accordingly
# int prob for x input will help in setting the bar length based on the accuracy of
the model prediction and y axis same as above and
# colors will call the colors function based on the num (of action) # -1 for filling up
the box
[Link](output_frame, (0,60+num*40), (int(prob*100), 90+num*40),
colors[num], -1) [Link](output_frame, actions[num], (0, 85+num*40),
cv2.FONT_HERSHEY_SIMPLEX, 1,
(255,255,255), 2, cv2.LINE_AA)
Page 62 of 60
Hand Sign Detection
return output_frame
# seq will collect the 30 frames for prediction sequence=[]
# concatinate the seq from history sentence=[]
predictions=[] threshold=0.5
cap = [Link](0) # Set mediapipe mqodel
# so we are accesing the mediapipe model using the with mp_holistic.Holistic
# so how the mediapipe model works is that it actuallly makes an initial detection
using the min_detection_confidence
# then track the key points with min_tracking_confidence=0.5 we can change it as
well
with mp_holistic.Holistic(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as holistic: while [Link]():
# Read feed
ret, frame = [Link]()
# Make detections
# for entering the function
image, results = mediapipe_detection(frame, holistic) print(results)
# # Draw landmarks
# helps in accessing the draw_landmarks func allowing to draw landmarks through
mediapipe draw_styled_landmarks(image, results)
# ==> prediction logic
# extracting the keypoints for the seq keypoints=extract_keypoints(results)
[Link](keypoints)
# [Link](0,keypoints)
# here we are collecting the last 30 frames to generate the predictions
sequence=sequence[-30:]
# so if sequence is 30 ie 30th seq then only the prediction value is entered like the
if clause below
if len(sequence)==30:
Page 63 of 60
Hand Sign Detection
# here we are predicting based on the 30 frames and here perticularly
np.expand_dims is allowing us to grab the values
# which are basically utilized for the perticular action
# like for eg we see the shape of the X_train then the original shape is going to be 3
d arr unless we use .tolist and as a result we cannot access
# the value present at axis=0 ie for the actions for that perticular seq for this
reason np.expand_dims help in accessing the values
res=[Link](np.expand_dims(sequence,axis=0))[0] # printing using the
softmax prediction print(actions[[Link](res)])
# appending the black arr [Link]([Link](res))
# ==> visualization logic
# so here basically the last 10 values from the prediction arr are being check with
the res # providing a more better accuracy for predictions
if [Link](predictions[-10:])[0]==[Link](res):
if res[[Link](res)] > threshold:
if len(sentence) > 0:
# here in the below line we are checking that the previous word is not the same as
of the last word of the sent
if actions[[Link](res)] != sentence[-1]:
[Link](actions[[Link](res)])
else:
[Link](actions[[Link](res)])
if len(sentence) > 5:
# so last 5 values will be displayed not resulting in the clutter of words on the
screen sentence = sentence[-5:]
# ==> Viz probabilities
# entering the prob func for the better visualization image = prob_viz(res, actions,
image, colors)
[Link](image, (0,0), (640, 40), (245, 117, 16), -1)
[Link](image, ' '.join(sentence), (3,30), cv2.FONT_HERSHEY_SIMPLEX, 1,
(255, 255, 255), 2, cv2.LINE_AA)
# Show to screen [Link]('OpenCV Feed', image)
# Break
Page 64 of 60
Hand Sign Detection
if [Link](10) & 0xFF == ord('q'): break
[Link]() [Link]()
[Link]() [Link]()
Page 65 of 60
Hand Sign Detection
Appendix: B
Hand Sign Detection User Manual
1. Introduction
Welcome to the User Manual for the Hand Sign Detection System. This
manual provides step-by-step instructions for setting up, using, and
troubleshooting the system. The Hand Sign Detection System utilizes
advanced machine learning algorithms to recognize and interpret hand
gestures, making communication and accessibility seamless
2. System Requirements:
2.1. Hardware Requirements
RGB or depth cameras for hand gesture input.
Minimum 8GB RAM for smooth operation.
GPU (e.g., NVIDIA GTX 1050 or better) for efficient processing.
500GB storage for models and datasets.
2.2. Software Requirements
Operating System: Windows 10/11, macOS, Linux.
Libraries: TensorFlow, PyTorch, OpenCV for machine learning.
Python programming language.
Integrated development environments (IDE): PyCharm or VS Code.
3. Installation and Setup
3.1. Installation
Connect the required camera/sensor to your computer.
Download the Hand Sign Detection software from the provided
source or website.
Run the installation file and follow the on-screen instructions.
Install dependencies using pip install -r [Link] for Python
libraries.
Page 66 of 60
Hand Sign Detection
3.2. Calibration
Launch the application and go to the calibration menu.
Position your hands within the detection area as directed by the
system prompts.
Perform sample gestures for sensitivity and accuracy adjustments.
4. Using the Hand Sign Detection System
4.1. Starting the System
Open the Hand Sign Detection application.
Ensure the camera is active and positioned to capture gestures
effectively.
Click "Start Detection" to begin recognizing gestures.
4.2. Supported Features
Gesture Recognition: Detects letters, numbers, and commonly used
phrases.
Real-Time Feedback: Displays the recognized gestures as text or
audio output.
Multilingual Support: Translates gestures into multiple languages.
4.3. Gesture Library
Access the in-app library to view supported gestures such as:
Letters (A–Z)
Numbers (0–9)
Common phrases like "Hello," "Thank You," "How are you?"
5. Key Features
High Accuracy: Reliable recognition with minimal false positives.
Real-Time Processing: Immediate detection and feedback.
Cross-Platform Compatibility: Functions on various devices and operating
systems.
Ease of Use: Simple interface suitable for non-technical users.
6. Troubleshooting
Page 67 of 60
Hand Sign Detection
6.1. Common Issues
Problem Cause Solution
Gesture not Poor lighting or Improve lighting or
recognized occluded hand visibility.
System lag Insufficient hardware Upgrade GPU or RAM.
resources
Incorrect output Improper calibration Recalibrate the
system.
6.2. Maintenance
Keep the camera lens clean for optimal input.
Regularly update the software to the latest version.
Back up your gesture dataset and model files.
1. Safety Guidelines
Operate the system in a well-lit environment for accurate
detection.
Avoid prolonged use of the system to reduce eye and hand strain.
Ensure the camera is securely mounted to prevent movement
during operation.
7. FAQs
Q: Can I add new gestures to the system?
A: Yes, by updating the dataset and retraining the model.
Q: Is the system mobile-compatible?
A: The software can be optimized for mobile devices with additional
adjustments.
Q: What is the accuracy rate of the system?
A: The system achieves over 90% accuracy under optimal condition
Page 68 of 60
Hand Sign Detection
Appendix: C
Previous Research Paper/ Certificates
Page 69 of 60
Hand Sign Detection
Page 70 of 60
Hand Sign Detection
Page 71 of 60