0% found this document useful (0 votes)
40 views20 pages

Hand Gesture Control Using Computer Vision

Uploaded by

Argha Mukherjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views20 pages

Hand Gesture Control Using Computer Vision

Uploaded by

Argha Mukherjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

A PROJECT REPORT

On

<Hand Gesture Control Using Computer Vision>

Submitted by

ARJUN MARANDI (10800223046)


ARGHA MUKHERJEE (10800223044)
AYAN PAL (10800223057)
RANA DUTTA (10800223116)

Under the Guidance of

SUDIP KUMAR DE
(Assistant Professor)

Information Technology

Asansol Engineering College


Asansol

Affiliated to

MAULANA ABUL KALAM AZAD UNIVERSITY OF TECHNOLOGY

05/2023

i
CERTIFICATE

Certified that this project report on “……Hand Gesture Control Using Computer Vision…….. .” is
the bonafide work of “Student 1 (University Roll No), Student 2 (University Roll No), Student 3
(University Roll No), Student 4 (University
Roll No), Student 5 (University Roll No)” who carried out the project work under my supervision.

__________________________________ _________________________________
SUDIP KUMAR DEY AVISHEK BANERJEE
Assistant Professor B-Tech in InformationTechnology

Information Technology
Asansol Engineering College
Asansol

ii
ACKNOWLEDGEMENT

It is our great privilege to express my profound and sincere gratitude to our Project Supervisor
<Name of Project Guide>, <Designation> for providing me with very cooperative and precious
guidance at every stage of the present project work being carried out under his/her supervision. His
valuable advice and instructions in carrying out the present study have been a very rewarding and
pleasurable experience that has greatly benefitted us throughout our work.

We would also like to pay our heartiest thanks and gratitude to <Name of HoD>, HoD, and all the
faculty members of the <Name of the Department>, Asansol Engineering College for various
suggestions being provided in attaining success in our work.

Finally, we would like to express our deep sense of gratitude to our parents for their constant
motivation and support throughout our work.

……………………………………………
(Arjun Marandi)

……………………………………………
(Argha Mukherjee)

……………………………………………
(Ayan Pal)

……………………………………………
(Rana Dutta)

Date: <DD/MM/YYYY> 3rd Year


Place: Asansol < Information Technology
>

iii
CONTENT

Certificate ……………………………………………………………………………… ii

Acknowledgement ……………………………………………………………………... iii

Content …………………………………………………………………………………. iv

List of Figures/Tables ………………………………………………………………….. v

Project Synopsis ………………………………………………………………………... 1

1. Introduction …………………………………………………………………….. 2–X


1.1

2. Project Details …………………………………………………………………... Y–Z


2.1
2.2

3. Conclusion and Recommendations ……………………………………………. Z1 – Z2

4. Reference ………………………………………………………………………… Z3 – Z4

iv
LIST OF TABLES

Section Title Page No.


Certificate 2

LIST OF FIGURES

Figure 1 Name of Figure Page Number


Figure 2 Name of Figure Page Number

Project Synopsis

v
This project, "Hand Gesture Control Using Computer Vision," aims to create an
intuitive and natural way for humans to interact with computers using hand gestures
captured by a webcam. It is a comprehensive system that leverages computer vision
and machine learning to interpret and translate hand movements into commands. This
project bypasses traditional input methods like a keyboard and mouse, offering a
hands-free, seamless user experience.
 Key Features:
o Real-time Hand Tracking: Accurately tracks the user's hand landmarks in
real-time using a camera feed.
o Gesture Recognition: Recognizes a predefined set of gestures such as
'Index Up', 'Open Palm', 'Fist', and 'Peace Sign'.
o Directional Commands: Detects the direction of gestures (e.g., pointing up,
down, left, right) to execute specific commands.
o Application Integration: The system can be integrated to control various
applications, such as a media player, a presentation, or a simple game.
 Technologies Used:
o Frontend: Python with libraries like OpenCV for video capture and image
processing.
o Backend/Core Logic: MediaPipe Hands for robust hand landmark detection
and NumPy for mathematical calculations on landmark coordinates.
o Algorithms: Custom-built algorithms to interpret landmark positions and
recognize distinct gestures based on the relative position of finger tips and
joints.
 Project Goals:
o Enhance User Interaction: Provide a natural and intuitive interface for
human-computer interaction.
o Improve Accessibility: Offer a new input modality for people with limited
mobility or in scenarios where touch-based interaction is impractical.
o Ensure Accuracy: Achieve high accuracy in gesture recognition even in
varying lighting conditions.
o Scalability: Design a system that can be expanded to include more complex
gestures and applications.
This project aspires to revolutionize the way we interact with technology by offering
an advanced, secure, and user-friendly platform that makes gesture control a viable and
practical input method.

1. Introduction

vi
In an increasingly digital world, the demand for more natural and intuitive interfaces
between humans and computers is growing. Hand gesture control, powered by
computer vision, represents a significant step in this evolution. This technology allows
users to interact with digital systems using the natural movements of their hands,
eliminating the need for physical contact with devices. This project aims to develop a
robust, real-time hand gesture recognition system that can be used to control various
applications and interfaces.
Current human-computer interaction (HCI) methods, such as keyboards and mice, are
effective but can be limiting in certain environments, like surgical rooms, public
displays, or for individuals with motor impairments. Hand gesture recognition
addresses these limitations by offering a hands-free, touch-less, and versatile
alternative. Our system leverages state-of-the-art computer vision libraries to not only
detect a hand but also identify specific gestures and their directional intent.
The core of our system is built on MediaPipe Hands, a powerful machine learning
solution from Google that provides 21 key landmarks on a hand from a single image.
By analyzing the relative positions of these landmarks, we can accurately determine
the hand's pose and interpret it as a specific command. For example, by checking the y-
coordinates of the finger tips relative to their base joints, we can classify whether a
finger is extended or curled.
Our mission is to create a system that is not only highly accurate but also lightweight
and responsive. The system is designed to be easily integrated into different
applications, from controlling a media player to navigating a presentation. We believe
that by simplifying the gesture recognition process and focusing on performance, our
project will set a new standard for human-computer interaction.

1.1 Motivation

vii
The development of this project was driven by the observed inefficiencies and
limitations in existing human-computer interaction (HCI) methods. Many platforms
and systems lack user-friendly interfaces, offer limited customization options, and fail
to provide a seamless experience for hands-free or touch-less control. This project aims
to address these shortcomings by creating a robust, dynamic system for hand gesture
control.
Furthermore, the existing interaction methods often fail to provide a seamless and
cohesive experience in specific contexts. Users may encounter disjointed processes,
such as the need to constantly switch between physical and digital inputs, which can
result in a fragmented and unsatisfactory journey. This can lead to decreased user
satisfaction and missed opportunities for more fluid interaction.
The development of our hand gesture control system was driven by the significant
limitations and inefficiencies observed in many existing applications. Despite the
growing digitalization of services, users continue to face challenges such as the lack of
hands-free operation, limited real-time responsiveness, and disconnected workflows
across different devices and platforms. For instance, in a smart home, users might
struggle to control multiple devices without a physical remote. These gaps highlight a
clear demand for a modern, integrated, and user-focused platform that enhances
convenience and user control.
Another key motivation was to address the lack of personalization and responsiveness
on existing systems. Hand gestures are a natural form of communication, yet most
systems offer rigid or static responses. With our project, we aimed to create a smart
and adaptive system where user gestures drive the experience. The goal was to
empower users with intelligent tools and a clean interface that simplifies interactions
and reduces the time and effort required to perform tasks.

2. Project Details

viii
Here is a detailed overview of our project, covering the core aspects of its
development. This section outlines the methodology adopted as 2.1 Methodology
2.2, Challenges and mitigation as 2.3, Results and discussion as 2.4 and in 2.5
Comparison with Existing Platforms.
2.1 Methodology
The development methodology followed in building the hand gesture control system is
structured, modular, and agile-driven. It emphasizes iterative progress, continuous
feedback, and scalability in design and deployment. The methodology includes the
functionalities in 2.1.1 System Design and Architecture and in 2.1.2 Development
Workflow.
2.1.1 System Design and Architecture
The architecture of our system is designed to provide a seamless, secure, and scalable
user experience. It follows a three-tier architecture comprising the presentation layer
(frontend), logic layer (backend), and data layer (database). Key components include:
Core Logic: Built using Python with a focus on libraries like OpenCV and MediaPipe,
the core logic serves as the application's engine, handling video capture, hand
landmark detection, and gesture recognition. The system is lightweight and designed to
perform real-time processing.
Input/Output: The input is a live video feed from a webcam. The output is a real-time
display of the recognized gesture and the processed video feed with landmarks drawn
on the hand for visual feedback.
Data Handling: The system processes data from the webcam in real-time, with no long-
term data storage. Temporary data related to landmark coordinates is used to perform
calculations and is discarded after each frame is processed.
Security Enhancements: The system is designed for local processing, meaning no
personal data or video frames are sent over a network, ensuring user privacy and data
security

2.1.2 Development Workflow


Here is the workflow diagram of our proposed system:

ix
The development of the hand gesture control system followed an Agile Software
Development Life Cycle (SDLC) model, allowing for iterative progress, constant
feedback, and flexibility in accommodating evolving requirements. This approach
ensured that each phase was handled systematically while encouraging collaboration
among team members. The workflow includes:
I. Requirement Gathering: This phase involved identifying the needs and expectations
of the target users - those who would benefit from a hands-free interface. We also
reviewed existing systems to identify common limitations and missing features. A
comprehensive requirement specification document was created to capture functional,
non-functional, and technical requirements.
II. Prototyping: We created a basic Python script to test the camera input and the
display of video frames. This allowed us to visualize the user experience before diving
into the core logic.
x
III. Implementation: Development was conducted in modules: video capture, landmark
detection, gesture recognition logic, and output display. Git version control was used to
manage collaboration.
IV. Testing: Both manual and automated testing were done to ensure functionality,
responsiveness, and accuracy. We tested the system under different lighting conditions
and with various hand sizes. Bugs were tracked and resolved.
V. Deployment: The application is designed to be a standalone, executable script. It
can be run on any system with a webcam and the required libraries installed.

2.2 Algorithm Used (Overview)

The algorithm used in this project is rule-based and very simple. Instead of using
heavy machine learning models, it only looks at the relative position of two key points:
 Wrist (landmark 0)
 Index finger tip (landmark 8)

xi
By comparing their x and y coordinates, the system can understand in which direction
the finger is pointing. This is enough to recognize five gestures: Up, Down, Left,
Right, Forward.

Step-by-Step Algorithm
1. Frame Capture
o Read an image frame from the webcam.
o Flip the frame horizontally so it feels natural, like looking into a mirror.
2. Hand Landmark Detection
o Use MediaPipe Hands to detect the hand in the frame.
o If a hand is found, get all 21 landmarks (points).
3. Select Important Landmarks
o Landmark 0 → Wrist (the reference point).
o Landmark 8 → Index finger tip (the pointing finger).
4. Calculate Position Difference
o Compute difference:
 x_diff = index_tip.x - wrist.x
 y_diff = index_tip.y - wrist.y
5. Decide Gesture Based on Position
o If finger is more to the right → Right
o If finger is more to the left → Left
o If finger is above wrist → Up
o If finger is below wrist → Down
o If finger is close to wrist → Forward
6. Show Output
o Print the gesture on screen.
o Draw landmarks for user feedback.
This simple logic makes the system fast and beginner-friendly, while still giving useful
gesture recognition.

2.2.1 Pseudocode of the Algorithm

To make the algorithm easy to understand, here is the pseudocode in plain English
with simple logic.

START
Open webcam

xii
LOOP until user quits:
Capture frame from webcam
Flip the frame for mirror effect
Convert frame to RGB (needed for MediaPipe)

Detect hand landmarks using MediaPipe


IF hand is detected:
Get coordinates of Wrist (landmark 0)
Get coordinates of Index Finger Tip (landmark 8)

Calculate x_diff = index.x - wrist.x


Calculate y_diff = index.y - wrist.y

IF absolute(x_diff) > absolute(y_diff):


IF x_diff > 0.1 THEN Gesture = "Right"
ELSE IF x_diff < -0.1 THEN Gesture = "Left"
ELSE:
IF y_diff > 0.1 THEN Gesture = "Down"
ELSE IF y_diff < -0.1 THEN Gesture = "Up"

IF differences are very small THEN Gesture = "Forward"

PRINT Gesture
END IF

Display the frame with landmarks


END LOOP

STOP

Explanation (Step-by-Step)
1. Start & Webcam Initialization
o The algorithm begins by opening the default webcam of the computer.
o This provides a continuous stream of video frames for processing.
2. Main Loop
o A loop is started so the system keeps working until the user quits (usually
by pressing q).

xiii
3. Frame Capture
o One frame (single image) is captured from the webcam feed.
4. Flip Frame
o The captured frame is flipped horizontally.
o This makes the movements look natural, like looking into a mirror.
5. Convert to RGB
o MediaPipe requires RGB images instead of OpenCV’s default BGR format.
o So the frame is converted from BGR → RGB.
6. Hand Landmark Detection
o MediaPipe processes the RGB frame.
o If a hand is detected, it returns 21 landmarks (points).
o Each landmark has an (x, y) coordinate.
7. Select Wrist and Index Tip
o Landmark 0 → Wrist
o Landmark 8 → Index Finger Tip
o These two points are enough to decide the finger’s direction.
8. Calculate Differences
o x_diff = index.x – wrist.x → measures horizontal movement.
o y_diff = index.y – wrist.y → measures vertical movement.
9. Gesture Decision
o If horizontal difference (x_diff) is bigger → left/right movement.
 x_diff > 0.1 → Right
 x_diff < -0.1 → Left
o Else, if vertical difference (y_diff) is bigger → up/down movement.
 y_diff > 0.1 → Down
 y_diff < -0.1 → Up
o If both differences are very small → Forward.
10.Print Gesture
o The recognized gesture is printed in the console.
o Example: “Up”, “Down”, “Left”, “Right”, or “Forward”.

xiv
11.Show Frame with Landmarks
o The hand landmarks and connecting lines are drawn on the video frame.
o This gives visual feedback to the user.
12.Loop Back
o The system goes back to capture the next frame.
o The process continues until the user presses the quit key.
13.Stop
o When the loop ends, the webcam is released, and the program stops.

2.3 Challenges (Introduction)

During the development of this project, we faced a number of challenges. Since it is a


real-time computer vision project, many factors like lighting, background, and camera
quality affect the accuracy of detection.
Here are the main challenges:
1. Lighting Issues
o In dark rooms, the camera fails to capture clear hand details.

xv
o In very bright sunlight, shadows and glare disturb landmark detection.
2. Background Problems
o If the background has many objects, the algorithm sometimes detects false
points.
o For example, a poster or object behind the hand may cause errors.
3. Camera Quality
o Cheap webcams produce blurry images.
o This reduces the performance of MediaPipe.
4. Real-Time Processing in Colab
o Google Colab does not support continuous webcam streaming.
o Only single frames can be captured and shown.

2.3 Challenges (Solutions)

To solve the above challenges, we applied the following solutions:


1. Lighting Fix
o Always use front-facing light on the hand.
o Avoid sitting against a bright window.
2. Background Fix
o Use a plain background (e.g., a wall).
o This makes hand detection more stable.

xvi
3. Camera Fix
o Use at least a 720p webcam for better results.
o In Colab, accept that resolution may be lower.
4. Colab Limitation Fix
o In Colab, run the code to capture a snapshot of your hand.
o For full real-time video, use VS Code or PyCharm on your PC.
These fixes made the system work more smoothly in simple environments.

2.5 Comparison with Other Methods


Our project is very simple, but there are other gesture recognition methods. Let us
compare them.

Observations:
 The system works best for Up, Down, Forward.
 Left and Right sometimes give wrong results if hand is tilted.
 In proper lighting, accuracy is above 85% for basic gestures.

xvii
Method Cost Accuracy Hardware Needed Ease of Use
Glove-based High Very High Special glove Hard

Color Low Low Webcam only Easy


tracking

Deep Medium High GPU Hard


learning

Our Project Very Low Good Webcam only Very Easy

Discussion

From the results, we can conclude that:


 The system is fast and simple.
 Works well for basic directions (Up, Down, Forward).
 Left and Right recognition can be improved with clearer rules.
 Works better on local computer (VS Code) than Colab.
 For students and beginners, this project is a good introduction to computer
vision.

xviii
Applications

This project can be used in many areas:


1. Education
o Teachers can move presentation slides using gestures.
2. Gaming
o Games can be controlled without a joystick.
3. Accessibility
o Helps people with physical disabilities control computers.
4. Smart Devices
o Gesture-based control of lights, music, or fans.

Limitations & Future Scope

Limitations
 Works only with one hand.
 Needs good lighting.
 Only five gestures supported.
 Not suitable for complex sign language.
Future Scope
 Add more gestures like fist, palm, peace sign.
 Improve stability for left/right gestures.
 Support multi-hand tracking.
 Use machine learning models for advanced recognition.

xix
xx

You might also like