0% found this document useful (0 votes)
38 views53 pages

DartVision (Proposal)

Proposal of project DartVision

Uploaded by

Mohit Bhusal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views53 pages

DartVision (Proposal)

Proposal of project DartVision

Uploaded by

Mohit Bhusal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING
THAPATHALI CAMPUS

A Major Project Proposal


On
DartVision: AI-Driven Dart Targeting System Using Facial Recognition

Submitted By:
Kapur Pant (THA077BEI021)
Khagendra Raj Joshi (THA077BEI022)
Kiran Chand (THA077BEI023)
Mohit Bhusal (THA077BEI025)

Submitted To:
Department of Electronics and Computer Engineering
Thapathali Campus
Kathmandu, Nepal

June, 2024
ABSTRACT

The goal of the ”DartVision: AI-Driven Dart Targeting System Using Facial Recogni-
tion” project is to use computer vision and artificial intelligence to create a novel dart
throwing mechanism that can precisely target human subjects. The system makes use
of facial recognition algorithms for target identification, servo motors for precise dart
control, and a Depth Camera for target detection and depth perception. The device
accomplishes precise dart targeting by combining projectile motion principles with en-
ergy conservation laws. Non-lethal darts and strict safety standards are used to ensure
user safety. The DartVision technology has potential uses in military defense systems
for locating and eliminating certain targets in crowds, in addition to recreational sports.
Extensive testing and validation procedures will be carried out to evaluate the system’s
efficiency and dependability in many situations. This project offers a fresh take on dart
aiming systems and improves accuracy and security in dart-related activities.

Keywords: AI-driven, Computer vision, DartVision, Depth Camera, Face detection,


Projectile motion, Throwing mechanism

i
Table of Contents

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6 Project Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6.1 Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2. LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3. REQUIREMENT ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 ESP32cam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.2 Stepper motor(STH-39H112-06/NEMA17-42HS40) . . . . . . 9

3.1.3 Servo motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1.4 Motor Driver (A4988) . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.1 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.3 YOLOv8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4. SYSTEM ARCHITECTURE AND METHODOLOGY . . . . . . . . . . 13

4.1 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . 13

ii
4.2 System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2.1 Computer Processing . . . . . . . . . . . . . . . . . . . . . . . 14

4.3 System Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4 Dataset Training Block Diagram . . . . . . . . . . . . . . . . . . . . . 24

4.5 YOLOv8 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.6 Face Detection Using YOLOv8 . . . . . . . . . . . . . . . . . . . . . . 25

4.7 Target tracking using Extended Kalman Filters . . . . . . . . . . . . . . 28

5. Expected Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6. Feasibility Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7. Project Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8. Estimated Project Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

9. APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Appendix A: Mean Average Precision (mAP) . . . . . . . . . . . . . . . . . 42

Appendix B: Reference Design of Launching Mechanism . . . . . . . . . . . 44

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

iii
List of Figures

Figure 3-1 ESP-32 Camera Module . . . . . . . . . . . . . . . . . . . . . . . 9


Figure 3-2 Stepper Motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Figure 3-3 Servo Motor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Figure 3-4 A4988 Motor Driver . . . . . . . . . . . . . . . . . . . . . . . . . 11
Figure 4-1 System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 4-2 String spring constant graph . . . . . . . . . . . . . . . . . . . . . 15
Figure 4-3 Relation between pitch angle(θ ) and launch velocity(v) . . . . . . . 16
Figure 4-4 Yaw angle calculation . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 4-5 perceiving depth [9] . . . . . . . . . . . . . . . . . . . . . . . . . 18
Figure 4-6 Computer system achieving sterio vision [9] . . . . . . . . . . . . . 19
Figure 4-7 direction vector [9] . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Figure 4-8 direction vector intersection [9] . . . . . . . . . . . . . . . . . . . 20
Figure 4-9 Depth Calculation Method [9] . . . . . . . . . . . . . . . . . . . . 21
Figure 4-10 Depth Calculation Method 2 [9] . . . . . . . . . . . . . . . . . . . 21
Figure 4-11 System Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 4-12 Face Detection Using YOLO . . . . . . . . . . . . . . . . . . . . . 24
Figure 4-13 YOLOv8 architecture . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 4-14 Face Detection Using YOLO . . . . . . . . . . . . . . . . . . . . . 26
Figure 4-15 Extended Kalman Filter Loop . . . . . . . . . . . . . . . . . . . . 30
Figure 4-16 True Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Figure 4-17 Plot of trajectory angle and range . . . . . . . . . . . . . . . . . . . 35
Figure 4-18 True Trajectory vs Estimated Trajectory . . . . . . . . . . . . . . . 36
Figure 5-1 Reference of expected outcome [10] . . . . . . . . . . . . . . . . . 38
Figure 7-1 Project Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Figure 9-1 Mean Average Precision Metrices . . . . . . . . . . . . . . . . . . 42
Figure 9-2 3d view of pitch and yaw angle . . . . . . . . . . . . . . . . . . . . 44
Figure 9-3 Reference 3d design of dart shooting mechanism [10] . . . . . . . . 44

iv
List of Tables

Table 4-1 Predefined Extended Kalman Filter Functions . . . . . . . . . . . . . 30


Table 8-1 Budget Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

v
List of Abbreviations

AI Artificial Intelligence

CHT Circle Hough Transform

CNC Computer Numerical Control

CNN Convolutional Neural Network

CPU Central Processing Unit

DST Discrete Sine Transform

EKF Extended Kilman Filter

FDDB Face Detection Dataset and Benchmark

FPN Feature Pyramid Network

IDE Integrated Development Environment

IoU Intersection over Union

KF Kilman Filter

KLT Kanade-Lucas-Tomasi

LBP Local Binary Pattern

mAP Mean Average Precision

MCU Microcontroller Unit

NMS Non-Maximum Supression

PCA Principle Comoponent Analysis

PWM Pulse Width Modulation

R-CNN Region-Based Convolutional Neural Network

SOTA State of the ART

SVM Support Vector Machine

YOLO You Only Look Once

vi
1. INTRODUCTION

”DartVision: AI-Driven Dart Targeting System Using Facial Recognition” project seeks
to transform dart throwing mechanisms by utilizing computer vision and AI to precisely
target human subjects with non-lethal darts. The DartVision system, whose possible
uses range from leisure sports to military defense systems, promises to improve safety,
accuracy, and efficiency in dart-based activities through a combination of latest tech-
nologies and creative engineering.

1.1 Background

The accuracy and precision of traditional dart throwing devices are limited, especially
when aiming at moving targets, as they frequently rely on manual aiming techniques.
Furthermore, conventional systems require continual human interaction because they
are unable to recognize and track targets on their own. By enabling automated target
detection, recognition, and accurate dart targeting, the development of AI and computer
vision technologies offers a chance to get beyond these restrictions.

Dart sports, such competitive dart throwing and games like ”501,” have a long and illus-
trious history. Millions of fans have been playing dart sports in leagues, tournaments,
and informal games around the world in recent years, as their popularity has skyrock-
eted. Dart throwing is a fun activity, but recreational dart throwing requires players
to be extremely accurate and precise, which emphasizes the value of trustworthy dart
targeting systems.

In order to achieve strategic goals and reduce dangers to friendly forces and civilians,
it is imperative for military operations to effectively target adversaries. Conventional
approaches to target acquisition and engagement frequently depend on guided weapon
systems or manual targeting, which might be inadequate or impracticable in some situ-
ations, particularly those involving asymmetric warfare or urban settings.

Defense think tanks like the RAND Corporation have performed research that points to
precise targeting as a key component in lowering civilian casualties and collateral dam-
age during military operations. The need for more sophisticated and accurate targeting
systems is highlighted by the analysis of historical data, which shows cases in which
poor targeting resulted in unintentional harm.

Additionally, market research studies show that the military and law enforcement in-
dustries are becoming more and more in need of technologically sophisticated targeting
systems. This pattern reflects the growing focus on autonomous targeting capabilities

1
and precision-guided weapons to improve operational effectiveness and reduce hazards.

1.2 Motivation

The ”DartVision: AI-Driven Dart Targeting System Using Facial Recognition” project
is driven by the following main reasons:

Improving Safety and Precision: Manual aiming methods are frequently used in tra-
ditional dart throwing devices, which can lead to inconsistent and inaccurate results,
especially when aiming at moving targets. The DartVision system attempts to improve
targeting precision and accuracy by combining AI-driven facial recognition and com-
puter vision technologies, making sure that darts hit their intended targets with the least
amount of variation. The system can be used for a variety of purposes, such as recre-
ational sports and military training exercises, because it prioritizes the safety of both
operators and targets through the use of non-lethal darts and strict safety regulations.

Resolving Limitations of Current Systems: Current dart targeting systems have a


number of issues, including the incapacity to recognize and follow targets on their own
and adjust to shifting environmental circumstances. By creating an advanced system
that can detect, identify, and track targets in real time, the DartVision project aims
to address these issues. The device is capable of precisely identifying human targets
in the midst of complicated backgrounds by utilizing depth sensing technologies and
advanced AI algorithms. It can also dynamically modify the trajectory of the dart for
the best possible aiming.

1.3 Problem Definition

Several significant issues and restrictions with conventional dart aiming systems are ad-
dressed by the DartVision project, especially those pertaining to accuracy, automation,
and flexibility. The following are the main issues the project attempts to address:

Lack of Target Precision: Manual aiming methods are frequently used in traditional
dart throwing devices, which leads to inconsistent and inaccurate results, particularly
when aiming at moving targets or objects. Whether in casual sports or military drills,
this impreciseness can reduce the overall efficacy of dart-based activities. By creating
an AI-driven targeting system that can autonomously recognize and track human targets
with great precision and accuracy, the DartVision project aims to address this issue.

Inefficiency in Target Acquisition: Most conventional dart targeting systems are un-
able to identify and lock onto targets on their own, necessitating continuous human

2
interaction in order to aim and fire darts correctly. This manual procedure can be la-
borious and prone to mistakes, especially in busy or dynamic settings where targets
might be concealed or move quickly. The goal of the DartVision system is to simplify
the target acquisition procedure by combining computer vision and facial recognition
technologies. This will allow for the quick and accurate identification of human targets
in real-time.

Restricted Flexibility to Environmental Factors: Current dart targeting systems


could find it difficult to adjust to shifting environmental factors such as shifting light-
ing, background clutter, or occlusions. These elements may have a major influence
on the system’s detection and tracking accuracy, which could result in less-than-ideal
performance and dependability. The DartVision project uses AI algorithms and cutting-
edge depth sensing technology to improve the system’s resilience and adaptability in a
variety of environmental conditions.

Safety Issues with Target Engagement: When using darts in activities involving
humans, safety must always come first. Conventional dart throwing systems carry some
danger of damage or injury, particularly if the darts are thrown too hard or are not aimed
precisely. The DartVision project places a high priority on safety by using non-lethal
darts and putting strict safety procedures in place to reduce the possibility of mishaps
or unintentional injury to targets and operators.

Demand for Versatile Applications: Although dart targeting systems are often con-
nected to leisure activities, there is an increasing need for these systems to be used in
other fields, like security operations or military defense. Their usefulness and efficacy
may be limited by the inability of current technologies to scale and adapt to the various
needs of different applications. By creating a versatile targeting system that can satisfy
the needs of a range of applications, from recreational activities to tactical engagements,
the DartVision project seeks to close this gap.

In conclusion, the goal of the DartVision project is to create a sophisticated AI-driven


targeting system that will overcome the shortcomings and difficulties present in current
dart aiming systems. Through addressing problems with accuracy, automation, flexibil-
ity, safety, and adaptation, the project hopes to provide a game-changing solution that
raises the bar for dart-based activities’ efficacy and efficiency in a variety of contexts.

3
1.4 Project Objectives

• To create an AI-powered dart aiming system that uses computer vision and facial
recognition to target accurately and independently.

• To ensure safety and versatility by integrating advanced depth sensing, non-lethal


dart propulsion, and stringent safety protocols.

1.5 Applications

The application areas of the project includes:

• Recreational Sports: By providing players with an advanced targeting system,


improving the overall gaming experience, and encouraging friendly competition,
DartVision has the potential to completely transform recreational dart games.

• Practice and Training: DartVision is an excellent training tool for darts play-
ers who want to improve. It can provide immediate feedback on accuracy and
technique to the players.

• Entertainment Venues: Entertainment venues such as arcades, theme parks,


and recreational centers can integrate DartVision systems into their attractions,
offering patrons an immersive and engaging dart throwing experience.

• Military Training: DartVision’s facial recognition and targeting capabilities


make it suitable for military training exercises, allowing soldiers to practice target
acquisition and engagement in simulated combat scenarios.

• Crowd Control: In situations where crowd management is crucial, such as


public events or protests, DartVision can help security personnel identify and
neutralize specific targets within crowds while minimizing collateral damage.

1.6 Project Scope

1.6.1 Capabilities

• Real-Time Face Detection: The DartVision system is expected to be capable of


detecting human faces in real-time using YOLO-based face detection algorithms,
enabling rapid identification of targets.

• Continuous Tracking: Using the Kalman Filters (KF), DartVision ensures con-

4
tinuous face tracking and blurs all non-target faces, maintaining focus on the
target.

• Precise Dart Targeting: By integrating servo and stepper motors for dart con-
trol, the system can accurately adjust dart trajectory based on the tracked position
of the target’s face, enhancing targeting precision.

• Military Training: DartVision’s facial recognition and targeting capabilities


make it suitable for military training exercises, allowing soldiers to practice target
acquisition and engagement in simulated combat scenarios.

• Crowd Control: In situations where crowd management is crucial, such as


public events or protests, DartVision can help security personnel identify and
neutralize specific targets within crowds while minimizing collateral damage.

1.6.2 Limitations

• Environmental Constraints: While efforts can be made to optimize perfor-


mance under various conditions, the system may still encounter challenges in
extreme environments, such as extremely low light or heavily cluttered back-
grounds.

• Hardware Limitations: The capabilities of the DartVision system may be con-


strained by the hardware used, including limitations in sensor resolution, motor
precision, and computational resources.

5
2. LITERATURE REVIEW

In the research of S.R. Rath[1], moving objects in a video are detected using computer
vision techniques. Frame differencing and computer vision techniques are implemented
to detect whether there are any moving objects in a video. The video is divided into
multiple frames, which are made up of pixels of colors. The current frame is subtracted
from the past frame, and the color is identified. Based on the color, it is determined that
something has moved or changed position.

The authors of[2] suggested four basic methods for solving object segmentation prob-
lems for detecting moving region i.e. Background subtraction, temporal differencing,
statis- tical method and optical flow. Background subtraction is commonly used to
detect mov- ing regions in images but requires a good background model to handle dy-
namic scenes. Temporal differencing uses pixel differences between consecutive frames
but can create holes in moving objects. Optical flow detects independent motion even
with camera movement but is computationally intensive and noise-sensitive. Statistical
methods dy- namically update background models to classify pixels as foreground or
background based on statistical characteristics.

This paper introduces a robust algorithm based on background subtraction and DST
for moving object detection and segmentation. The proposed method reduces com-
putational complexity compared to traditional techniques and effectively identifies and
segments moving objects in static backgrounds.

The research paper by Shivam Singh and Prof. S. Graceline Jasmine [3] focuses on
developing an automated face recognition system using several algorithms to enhance
accuracy and efficiency. The system integrates face detection, feature extraction, and
recognition algorithms to automatically identify individuals from still images or video
frames. Key algorithms employed include the Viola-Jones algorithm for face detection
using Haar cascade classifiers, the Kanade-Lucas-Tomasi (KLT) tracker for continuous
face tracking, and Principal Component Analysis (PCA) for feature extraction. The pro-
posed system involves multiple stages: image capture, face detection, pre-processing,
database development, and post-processing for real-time recognition. The effectiveness
of the system is demonstrated in varying lighting conditions and emphasizes its poten-
tial applications in security systems, surveillance, and identity verification. Despite its
robustness, the system faces challenges such as handling different poses, facial expres-
sions, and poor lighting conditions, suggesting areas for future improvement.

The research paper ”Real-time face detection based on YOLO” presented by Wang

6
Yang and Zheng Jiachun [4], explains the details the application of the YOLO (You
Only Look Once) network for face detection. YOLO stands out due to its high detection
speed and accuracy, which are crucial for real-time applications. The paper [4] com-
pares YOLO with other object detection methods like R-CNN, emphasizing YOLO’s
end- to-end training and detection process that integrates feature extraction, classifica-
tion, and regression into a single network. The YOLOv3 variant, which the authors
focus on, uses a multi-scale feature map and up-sampling techniques to improve de-
tection, especially for small objects. They also discuss the importance of adapting the
anchor boxes for specific tasks via dimension clustering using the k-means algorithm,
which enhances detection accuracy. Experimental results using datasets like WIDER
FACE, Celeb Faces, and FDDB demonstrate YOLOv3’s robustness and faster detection
times, confirming its suitability for real-time face detection even in complex environ-
ments.

The research paper ”Review and Comparison of Face Detection Techniques” by Sudipto
Kumar et. al. [5] compares various face detection methods, focusing on Haar- like cas-
cade classifiers, Local Binary Pattern (LBP) cascade classifiers, and Support Vector
Machine (SVM)-based methods. It evaluates these techniques based on detection time,
accuracy, performance in low light, and effectiveness on diverse skin tones, specifically
dark complexions. Haar cascade classifiers, while accurate, struggle with low light and
dark skin tones, and have high false positive rates. LBP classifiers perform well in chal-
lenging lighting and with dark complexions but have lower accuracy and higher CPU
usage. SVMs, although accurate, are slower and less effective in low light conditions.
The study concludes that while each method has strengths and weaknesses, a combi-
nation of features from Haar and LBP classifiers could potentially yield better overall
performance.

In the paper ”Design and Control of an Articulated Robotic Arm for Archery,” Ah-
madRafiq Mohd Khairudin et al. [6] explore the innovative application of robotics in
sports, specifically focusing on archery. The study highlights the development and inte-
gration of a robotic system comprising a Universal Robot UR5 robotic arm, motion con-
trollers, and a vision-based targeting system. Utilizing OpenCV algorithms, namely the
Circle Hough Transform (CHT) and color and contour detection, the robot identi- fies
the target center, aims, draws, and releases the arrow with a high degree of accu- racy
(87.56). This approach addresses the challenge of hand-eye coordination in archery,
leveraging the robot’s ability to perform repetitive tasks without tremors. The paper
also discusses the methodology for training the arm, including setting up waypoints for
accurate targeting and shooting, and compares the efficiency of different algorithms in
target detection. The results indicate that the OpenCV algorithm is more effective for

7
dynamic targeting than the CHT algorithm. The study concludes with a proof of con-
cept demonstrating the feasibility of using collaborative robots for archery, suggesting
further research on enhancing precision and automating more complex tasks such as
nocking the arrow.

As there are many object detection algorithm present today YOLO(You Only Look
Once) can be considered as one the the finest algorithm .Many researchs have used
YOLO for real time object detection ,one of them includes the research paper[7] by Gu-
dala Lavanya and Sagar Dhanraj Pande that explores the YOLO (You Only Look Once)
algorithm’s transformative impact on real-time object detection. YOLO’s unique ap-
proach processes entire image at once making it highly efficient and fast. YOLO works
by dividing images into some grids and each grid cell predicts a fixed number of bound-
ing boxes and class probabilities. They also mentioned the architectural improvement
of YOLO form version v1 to v5 where v1 focused on basic concept but the v5 model
introduces features like mosaic augmentation that maintained a balance between speed
and accuracy. Despite the challenges such as detecting small objects,YOLO’s continu-
ous architectural refinements ensure it remains a pivotal tool in computer vision.

The article ”Kalman Filter and Its Application” by Qiang Li, et al.[8] provides a com-
prehensive survey of the Kalman filter (KF) and its variations, including the Extended
Kalman filter (EKF) and Unscented Kalman filter (UKF). The Kalman filter, introduced
by R. E. Kalman in 1960, is widely used for optimal estimation in dynamic systems,
particularly for tasks like target tracking and navigation. The paper discusses the ba-
sic theories, strengths, and limitations of each filter type. The standard KF is suited
for linear systems with Gaussian noise, while EKF extends its application to nonlinear
systems through linearization, though it can suffer from divergence if noise estimations
are inaccurate. The UKF further improves performance by approximating the probabil-
ity distribution of the nonlinear function without needing linearization, making it more
accurate and easier to implement.

8
3. REQUIREMENT ANALYSIS

The components required for the proper implementation of our project are provided
below.

3.1 Hardware Requirements

The required hardware components are listed as below:

3.1.1 ESP32cam

The ESP32-CAM is a versatile, low-cost microcontroller with integrated {Wi-Fi and


Bluetooth, featuring an OV2640 camera. It’s used to control hardware components such
as servo and stepper motors via its ?? pins, using PWM signals for precise movement.
Additionally, the ESP32-CAM can calculate facial depth by processing images captured
by its camera. This involves analyzing facial features to estimate distance, useful in
applications like security and robotics. Its ability to handle both hardware control and
real-time image processing makes the ESP32-CAM ideal for our project.

Figure 3-1 : ESP-32 Camera Module

3.1.2 Stepper motor(STH-39H112-06/NEMA17-42HS40)

The Stepper Motor serves as a crucial component for the project,primarily responsible
for the movement of mechanical hardware.Stepper motors are usually preferred in those
application where we require precise control over the rotaion angle and speed such as
CNC machines and other automated systems.The model that we have chose offers a
balance between torque,size and power consumption meeting the specific requirements
of mechanical components it will drive.

9
Figure 3-2 : Stepper Motor

3.1.3 Servo motor

Servomotor is a type of motor that can be precisely controlled in terms of movement or


position by receiving signals from a controller. It operates on a closed-loop feedback
system, where the control circuit constantly provides feedback about the position of the
motor’s shaft based on the value of the potentiometer attached to the shaft.

The project uses SG90 (SG refers to “Servo Gear” and 90 refers to its approximate rota-
tion capability of +-90 degrees) that provides stall torque up to 1.8kgf.cm Its operating
voltage range is 4.5V to 6V [9]

Figure 3-3 : Servo Motor

3.1.4 Motor Driver (A4988)

Motor Drivers, represented by the A4988 model, play a vital role in controlling the
stepper motors. These drivers convert the signals from the microcontroller into the
necessary power and sequence of pulses to drive the stepper motor effectively. The
A4988, in particular, is known for its reliability and compatibility with various stepper
motor types. It provides microstepping capabilities, allowing for smoother and more
precise control over the motor’s movement, contributing to the overall accuracy and

10
efficiency of the system.

Figure 3-4 : A4988 Motor Driver

3.2 Software Requirements

The necessary software required for the project are mentioned below:

3.2.1 Jupyter Notebook

Jupyter Notebook is a popular open-source tool for interactive computing, blending


code, text, and visualizations in a single document. It plays a crucial role in evaluating
data and performance. This tool aids us in refining the system, and testing reliability
during the development and testing phases. Jupyter Notebook 6.4.3 has been chosen for
its interactive data analysis capabilities. Arduino IDE The Arduino Integrated Devel-
opment Environment (IDE) is a user-friendly software platform used for programming
NodeMCU (ESP32). It provides a simple interface for writing, compiling, and upload-
ing code to microcontroller boards, making it accessible.

3.2.2 Python

Python which is a versatile and widely-used programming language, assumes a mul-


tifaceted role in the software architecture. Beyond its prowess in algorithm imple-
mentation, Python is harnessed for image processing tasks and interfacing with mi-
crocontrollers. Additionally, Python’s flexibility allows for seamless integration with
microcontrollers, facilitating communication and data exchange between the central
processing unit and the computational algorithms implemented in Python

3.2.3 YOLOv8

The integration of YOLOv8 (You Only Look Once, version 8) introduces a sophisti-
cated object detection model into the software repertoire. YOLOv8 is renowned for its
real-time object detection capabilities, making it an apt choice for identifying and local-
izing the cube within the captured images. The model’s efficiency in processing images
and detecting objects with high accuracy aligns with the project’s need for robust and

11
rapid face detection. YOLOv8 contributes to the software’s capability to analyze the
visual data captured by the camera and extract relevant information for further process-
ing.

12
4. SYSTEM ARCHITECTURE AND METHODOLOGY

4.1 Theoretical Background

The major concept underlying while making Dart-Vision is regarding how the real-life
object or any specific human being can be tracked applying machine learning algo-
rithm,the The YOLOv8 object detection system, like its predecessors is a single-stage
detector that processes and input image to predict bounding boxes and class probabili-
ties for objects all in a single go. Here is a look at how YOLOv8 operates,broken down
into several key stages.

4.2 System Block Diagram

The system block diagram for DartVision is as shown below:

Figure 4-1 : System Block Diagram

13
Explanation of components in system block diagram is presented below:

4.2.1 Computer Processing

This section includes all the tasks that is processed inside computers CPU.

Camera Modules: Two camera modules are required in this project.One of the two
module (camera module 1) is used to take real-time video and both of them combinely
is used for stereo vision that helps in calculating depth of target from the launching site.

Image Frame: Real time video from camera module 1 is converted into sequence of
image frame.

Yolo face detection model: After the image frame is achieved YOLOv8 model is used
for face detection as shown in the 4-14 .

Live GUI feedback system: A live camera video feedback is achieved using OpenCV
and GUI is designed using Tkinter or PyQt.Yolo face detection model is integrated with
the GUI system to visualize the realtime detection and tracking of the face.

Similarly, mouse click callback function is used to let the user click the targeted face
whose coordinate is separately saved which is further used to calculate pitch and yaw
angle.

Trigger confirmation button is made at GUI interface such that when pressed trigger
mechanism is activated.

Extraction of specific face coordinate: Coordinate of the targeted face in GUI is


saved in a unique variable which is further used in real coordinate conversion.

coordinate conversion to real world coordinate Coordinate of target face is converted


into real life coordinate with respect to the launching position of the system.

Calculation of launching velocity: Parameters such as coefficient of stiffness(k), dis-


placement(x) of string and mass(m) of dart is used to find out the launching velocity.
Factors such as air resistance and other form of energy losses can be kept into consid-
eration after several experiments.

Shooting Spring Constants: Force gauge can be used to measure the dart draw weight

14
at different positions and use linear regression to get the dart string’s spring constant.

Figure 4-2 : String spring constant graph

The slope of the line of best fit is the spring constant.

F = −Kx (4.1)

In the above example, K = 40

Conservation of Energy:

Now that the spring constant is known, the arrow’s initial velocity could be solved for
by knowing the energy was conserved from the darts string’s potential energy to the
kinetic energy of the arrow upon release.

1 1
kspring x = mv20 (4.2)
2 2
r
k
v0 = x (4.3)
m

15
Calculation of Pitch Angle:

As the arrow is in flight, it would move in a parabolic curve. The objective is to control
the shape of that curve, or more precisely, to control the angle and velocity at which the
dart would hit the target. A diagram of the arrow’s motion is shown in figure 4-3 .

Figure 4-3 : Relation between pitch angle(θ ) and launch velocity(v)

The depth and height of the target are known using the camera, as previously discussed.
There are multiple solutions for the initial angle and initial velocity if the final angle and
final velocity of the arrow were not considered. To reduce the solution space, the initial
velocity is set to the maximum extension of the bow, and the desired angle is solved for
using the formula below.

gd 2
h2 = h1 + d tan(θ ) − (4.4)
2v20 cos2 (θ )

where, h1 = height of launching mechanism, h2 = height of target, g = acceleration due


to gravity, θ = pitch angle.

This pitch angle(θ ) might have errors due to the resistance of air and losses in
energy because of multiple factors ,considering these factors (θ +ψ) is calculated and is
passed to the esp-32 cam in order to launch the dart in desired range of target.Here,(ψ)
is the constant that arises due to loss factors.

Calculation of Yaw angle: Using the value of real world coordinate of the face which

16
is stored in a unique variable and the initial position of servo, the yaw angle is calcu-
lated.

Figure 4-4 : Yaw angle calculation

From the figure 4-4 ,using the depth value(d), real world x coordinate and y coordinate
value of the target object/face, we can calculate yaw angle.

Mathematical expression to calculate yaw angle is:

d 2 = y2 + b2 (4.5)
p
b = d 2 − y2 (4.6)

Now the yaw angle can be calculated by using sine angle as,
p
sin(φ ) = x/( d 2 − y2 ) (4.7)
p
φ = sin−1 (x/ d 2 − y2 ) (4.8)

Here, this yaw angle (φ ) is passed to the esp-32 cam in order to face the dart launching
tube in the direction of target.

17
Depth Calculation: As stereo vision is used to calculate the depth from object to
launching site,intrinsic and extrinsic parameter of the camera is at first estimated,optical
center and focal length comes under the intrinsic parameter whereas position of cam-
era in 3d space comes under extrinsic parameter.Using these data both the cameras are
calibrated.Finally, depth value is assigned to each pixel of the live video feedback.

Computer stereo vision involves extracting 3D information from 2D images, similar to


the images captured by CCD cameras. By comparing data from different viewpoints,
stereo vision systems determine the relative positions of objects in each view, enabling
the creation of a three-dimensional understanding of a scene. This technique is widely
applied in various fields, including advanced driver assistance systems and robotic nav-
igation.

This process mimics human vision. Our brains merge the slightly different images from
each eye, allowing us to perceive depth and spatial relationships, resulting in our three-
dimensional view of the world.

Figure 4-5 : perceiving depth [9]

By employing its prior knowledge of the relative distance between the cameras, the
computing system uses triangulation to determine the depth(d).

Each point depth has to be calculated in order to produce a 3d image from 2d.Using
this, each points relative depth is found out.

18
Figure 4-6 : Computer system achieving sterio vision [9]

An image (or image channel) with data on the distances between scene objects’ surfaces
as seen from a particular perspective is called a depth map. In 3D computer graphics
and computer vision, scene depths are commonly represented in this manner.

The Foundation of Stereo Vision in Geometry: The geometry of stereo vision is


known as epipolar geometry. The 3D points and their projections onto the 2D images
have a wide range of geometric relationships. These connections have been worked out
for the model of the pinhole camera.

When captured (projected) in a picture, a 3D object is projected into a 2D (planar)


projective space. This so-called ”planar projection” is problematic since it results in a
loss of depth.

The perceived velocity of objects is the difference between the two stereo images. When
we shut one eye and open it fast without opening the other, we will see that objects
close to us move more than those farthest away, which move very little. This behavior
is known as ”discrepancy.”

The Direction Vector:

19
A direction vector in epipolar geometry is a three-dimensional vector that originates
from an image pixel:

Figure 4-7 : direction vector [9]

The direction vector, as the name suggests, is the direction from where the light ray
arrives at the pixel sensor. This line thus carries all the 3D points that could be candidate
sources for the 2D pixels in the image. In the above figure, the direction vector Ls1 S1
originates from the point Ls1 , which is the “left” 2D pixel corresponding to the 3D point
S1 in the scene.

Direction Vector Intersection: As a result, direction vectors from the 2D pixels in


a stereo pair of photos will indicate a shared 3D point in the 3D scene. A direction
vector’s points are all potential sources. Since there is only one possible intersection
point for two vectors, that point is used as the source:

Figure 4-8 : direction vector intersection [9]

20
In the above figure 4-8 , the direction vectors from the left and right images (Ls1 S1 and
Rs1 S1 , respectively) intersect at the single source S1 . This 3D source point in the scene
is the point from where light rays cast image pixels Ls1 and Rs1 in the left and right
images.

Depth Calculation: The distance between cameras should be known and should be
very much small compared to the distance between camera and object.Then,the loca-
tion of the 3d point in the space by triangulation can be determined. The depth is a
perpendicular cast on the line joining the two cameras:

Figure 4-9 : Depth Calculation Method [9]

The above image 4-9 shows the actual depth ds1 for the point from the line joining the
two cameras.angle between the line ds1 and the line Ls1 Rs1 is not exactly 90 degrees.In
reality, however, the distance Ls1 Rs1 is very small compared to ds1 . This results in the
angle between the line ds1 and the line Ls1 Rs1 being approximately 90 degrees. Since,
the location of S1 is determined by triangulation,with the help of the relative distance
Ls1 Rs1 , the depth ds1 is calculated by using Pythagoras theorem.

Figure 4-10 : Depth Calculation Method 2 [9]

21
Since s is very large compared to t, the angle ∠S1 Ms1 Rs1 approaches 90◦ . Lengths
Ls1 Ms1 and Ms1 Rs1 are almost the same (denoted by t). Also, lengths Ls1 S1 andRs1 S1
are almost the same (denoted by s). Applying the Pythagorean theorem, s2 = ds1 2 + t 2 .
Solving for the depth of point S1

p
ds1 = s2 − t 2 (4.9)

Since s is very large compared to t, the depth ds1 is close to s.

Key Concepts for Mathematical Implementation of Stereo Vision: At the pixel


level,triangulation is used to identify a point in 3D space from the left and right pixel
points in a pair of stereo images.Disparity map is used for large images with millions
of pixel points.

Triangulation in Computer Vision: At the pixel level,triangulation is used to iden-


tify a point in 3D space from the left and right pixel points in a pair of stereo im-
ages.Disparity map is used for large images with millions of pixel points.

Triangulation in computer vision finds a 3D point’s location using its image projections
and camera information. It takes 2D image points and camera matrices to calculate a
3D point’s location. There are various techniques (mid-point, DLT, essential matrix)
that differ in complexity and accuracy.

Disparity-Map: Disparity maps show how far corresponding points have shifted be-
tween two stereo images. This shift, inversely related to depth, helps us create 3D
models.

Building a disparity map involves finding matching pixels between the left and right
images (solving the correspondence problem). Rectifying the images simplifies this by
aligning corresponding points horizontally. Block matching is a common technique to
find these matches. Finally, the disparity map is converted to a depth map using camera
data through a process triangulation.

22
4.3 System Flowchart

Figure 4-11 : System Flowchart

The flowchart outlines a detailed process for real-time video processing and object
tracking using the YoloV8 model. The process initiates with capturing live video, which
is subsequently fed into the YoloV8 model to identify objects and obtain bounding box
coordinates along with their confidence levels. This information is then displayed on a
graphical user interface (GUI), enabling user interaction.

The user can click on a specific frame in the video feed, and the system captures the

23
x and y coordinates of the click. It then iterates through the bounding box coordinates
provided by the YoloV8 model to determine if the click corresponds to any detected
object. If a match is found, the system applies a tracking algorithm to the identified
object, enabling continuous monitoring and tracking.

This sequence ensures that the system can accurately track objects in real-time based
on user input, leveraging the object detection capabilities of the YoloV8 model and
the interactive functionality of the GUI. The combination of real-time video capture,
object detection, user interaction, and tracking makes the system robust for applications
requiring precise and dynamic object monitoring.

4.4 Dataset Training Block Diagram

The block diagram for dataset training is as shown below:

Figure 4-12 : Face Detection Using YOLO

4.5 YOLOv8 Architecture

The YOLOv8 object detection system, like its predecessors is a single-stage detector
that processes and input image to predict bounding boxes and class probabilities for
objects all in a single go. Here is a look at how YOLOv8 operates,broken down into
several key stages.

The architecture consists of a Backbone Neck and Head. The Backbone is a convolu-
tional neural network (CNN) that is primarily responsible for extracting feature maps

24
Figure 4-13 : YOLOv8 architecture

from the input image. It processes the image through multiple layers of convolutions,
capturing various levels of abstraction and important spatial details.The Neck compo-
nent is responsible for aggregating the features extracted by the Backbone. This is typi-
cally achieved using path aggregation blocks like the Feature Pyramid Network (FPN),
which combines feature maps from different scales to create a rich, multi-scale feature
representation.Then it passes them onto head,predicting the final bounding boxes and
class probabilities for detected objects.

4.6 Face Detection Using YOLOv8

The state-of-the-art(SOTA) deep learning model YOLOv8 is intended for computer vi-
sion applications that require real-time object recognition.YOLOv8 has been widely
recognized for its real time performance and its accurcacy.For our project, the YOLO
technique is used to detect faces.The following describes how YOLO works for facial
detection.

25
Figure 4-14 : Face Detection Using YOLO

The YOLOv8 algorithm works based on the following four main appraoches:

1. Grid Division The first step starts by taking a frame of a real time video/image
where it divides the original image into SxS grid cells of equal shape where N in
our case is 4 as shown in the figure above.Each cell in the grid is responsible for
localizing and predicting of class and confidence score.

2. Bounding Box regression The next step is to determine the bounding boxes
which correspond to rectangles highlighting all the face in the image. We can
have as many bounding boxes as there are faces within a given image. YOLO
determines the attributes of these bounding boxes using a single regression mod-

26
ule in the following format, where Y is the final vector representation for each
bounding box. Y = [pc, bx, by, bh, bw, c1, c2] This is especially important during
the training phase of the model.

• pc corresponds to the probability score of the grid containing an face. For


instance, all the grids in red will have a probability score higher than zero.

• bx, by are the x and y coordinates of the center of the bounding box with
respect to the enveloping grid cell.

• bh, bw correspond to the height and the width of the bounding box with
respect to the enveloping grid cell.

• c1 and c2 correspond to the two classes Player and Ball. We can have as
many classes as your use case requires.

3. Intersection Over Union(IoU) During most of the time a single face in a real
time video can habve multiple grid box condidates for prediction,even though
most of them may not be relevant. The main aim the IOU (a value between 0 and
1) is to discard such grid boxes and only keep those that are relevant. Here is the
logic behind it:

• The user defines its IOU selection threshold, which can be, for instance,a.

• Then YOLO computes the IOU of each grid cell which is the Intersection
area divided by the Union Area.

• Finally, it ignores the prediction of the grid cells having an IOU ≤ a thresh-
old and considers those with an IOU > a threshold.

IOU = IntersectionArea/UnionArea (4.10)

4. Non Maximum Supression Simply setting a threshold for the Intersection over
Union (IoU) is not always sufficient to eliminate redundant detections and noise
in object detection tasks, including face detection with YOLO (You Only Look
Once). This is where Non-Maximum Suppression (NMS) plays a crucial role.
NMS ensures that only the bounding box with the highest probability score is
kept, effectively reducing the noise and improving the accuracy of the detection.

27
4.7 Target tracking using Extended Kalman Filters

The standard Kalman filter works for linear systems. Extended Kalman filter (EKF) is
used for non-linear systems by linearizing the equations around the current estimate.
This allows the EKF to track objects where the motion or the measurements are non-
linear. An example is when the object’s position is measured in spherical coordinates
but its state is represented in Cartesian coordinates.In case of this project, the target
might not have linear motion.Hence, this model can be considered as one of the best
approach for the target tracking for DartVision.

State Update Model: A closed-form expression for the predicted state as a function
of the previous xk , controls uk , noise wk and time r

xk+1 = f (xk , uk , wk , 1) (4.11)

The Jacobian of the predicted state with respect to the previous state is obtained by
partial derivatives as:
∂f
F (x) = (4.12)
∂x
The Jacobian of the predicted state with respect to the noise is:

∂f
F (w) = (4.13)
∂w

These functions take simpler forms when the noise is additive in the state update equa-
tion:
xk+1 = f (xk , uk ,t) + wk (4.14)

F (w) js (4.15)

In this case, an identity matrix.

Specification of the state of Jacobian matrix can be done using the StateTransition-
JacobianFen property of the tracking EKF object. If not specified this property, the
object computes Jacobians using numeric differencing, which is slightly less accurate
and can increase the computation time.

Measurement Model: In an extended Kalman filter, the measurement can also be a


nonlinear function of the state and the measurement noise.

28
zk = h(xk , vk ,t) (4.16)

The Jacobian of the measurement with respect to the state is:

∂ zk
Hx = (4.17)
∂ xk

The Jacobian of the measurement with respect to the measurement noise is: These
functions take simpler forms when the noise is additive in the measurement equation:

∂ zk
H (v) = (4.18)
∂ vk

These functions take simpler forms when the noise is additive in the measurement equa-
tion:
zk = h(x1 ) + vk (4.19)

In this case, H (v) is an identity matrix.

In trackingEKF,the measurement is specified using Jacobian matrix using the Measure-


ment JacobianFcn property. If not specified this property, the object computes the Ja-
cobians using numeric differencing, which is slightly less accurate and can increase the
computation time.

Extended Kalman Filter Loop: The extended Kalman filter loop is almost identical
to the loop of Linear Kalman Filters except that:

• The filter uses the exact nonlinear state update and measurement functions when-
ever possible.

• The state Jacobian replaces the state transition matrix.

• The measurement Jacobian replaces the measurement matrix.

29
Figure 4-15 : Extended Kalman Filter Loop

Predefined Extended Kalman Filter Functions:

The toolbox provides predefined state update and measurement functions to use in
trackingEKF.

Table 4-1 : Predefined Extended Kalman Filter Functions

Motion Model Function Function Purpose State Representation


Name

1-D: [x; vx]

constvel Constant-velocity 2-D: [x; vx; y; vy]


state update model
3-D: [x; vx; y; vy; z; vz]
Constant velocity
1-D: [x; vx]

constveljac Constant-velocity 2-D: [x; vx; y; vy]


state update
3-D: [x; vx; y; vy; z; vz]
Jacobian

Continued on next page

30
Motion Model Function Function Purpose State Representation
Name

x, y, and z represent the po-


sition in the x-, y-, and z-
directions, respectively.
cvmeas Constant-velocity
measurement model vx, vy, and vz represent the
velocity in the x-, y-, and z-
directions, respectively.

x, y, and z represent the po-


sition in the x-, y-, and z-
directions, respectively.
cvmeasjac Constant-velocity
measurement vx, vy, and vz represent the
Jacobian velocity in the x-, y-, and z-
directions, respectively.

1-D: [x; vx; ax]

constacc Constant- 2-D: [x; vx; ax; y; vy; ay]


acceleration state
3-D: [x; vx; ax; y; vy; ay; z; vz; az]
Constant acceleration update model

1-D: [x; vx; ax]

constaccjac Constant- 2-D: [x; vx; ax; y; vy; ay]


acceleration state
3-D: [x; vx; ax; y; vy; ay; z; vz; az]
update Jacobian

ax, ay, and az represent the


cameas Constant- acceleration in the x-, y-, and
acceleration z-directions, respectively.
measurement model

ax, ay, and az represent the


cameasjac Constant- acceleration in the x-, y-, and
acceleration z-directions, respectively.
measurement
Jacobian

Continued on next page

31
Motion Model Function Function Purpose State Representation
Name

2-D: [x; vx; y; vy; ω]


constturn Constant turn-rate
state update model 3-D: [x; vx; y; vy; ω; z; vz]
Constant turn rate
2-D: [x; vx; y; vy; ω]
constturnjac Constant turn-rate
state update 3-D: [x; vx; y; vy; ω; z; vz]
Jacobian

ctmeas Constant turn-rate ω represents the turn-rate.


measurement model

ctmeasjac Constant turn-rate ω represents the turn-rate.


measurement
Jacobian

Example: Estimate 2-D Target States with Angle and Range Measurements Using
trackingEKF

Initialize Estimation Model

Assume a target moves in 2D with the following initial position and velocity. The
simulation lasts 20 seconds with a sample time of 0.2 seconds.

Matlab Code:
rng(2022); % For repeatable results
dt = 0.2; % seconds
simTime = 20; % seconds
tspan = 0:dt:simTime;
trueInitialState = [30; 1; 40; 1]; % [x;vx;y;vy]
initialCovariance = diag([100,1e3,100,1e3]);
processNoise = diag([0; .01; 0; .01]); % Process noise matrix

Assume the measurements are the azimuth angle relative to the positive-x direction and
the range to from the origin to the target. The measurement noise covariance matrix is:
Matlab Code:

32
measureNoise = diag([2e-6;1]); % Measurement noise matrix. Units
are m^2 and rad^2.

Preallocate variables in which to save results.


numSteps = length(tspan);
trueStates = NaN(4,numSteps);
trueStates(:,1) = trueInitialState;
estimateStates = NaN(size(trueStates));
measurements = NaN(2,numSteps);

Obtain True States and Measurements

Propagate the constant velocity model and generate the measurements with noise.
numSteps = length(tspan);
trueStates = NaN(4,numSteps);
trueStates(:,1) = trueInitialState;
estimateStates = NaN(size(trueStates));
measurements = NaN(2,numSteps);

Plot the true trajectory and the measurements.


figure(1)
plot(trueStates(1,1),trueStates(3,1),"r*",DisplayName="Initial
Truth")
hold on
plot(trueStates(1,:),trueStates(3,:),"r",DisplayName="True
Trajectory")
xlabel("x (m)")
ylabel("y (m)")
title("True Trajectory")
axis square

Output:

33
Figure 4-16 : True Trajectory

figure(2)
subplot(2,1,1)
plot(tspan,measurements(1,:)*180/pi)
xlabel("time (s)")
ylabel("angle (deg)")
title("Angle and Range")
subplot(2,1,2)
plot(tspan,measurements(2,:))
xlabel("time (s)")
ylabel("range (m)")

Output:

34
Figure 4-17 : Plot of trajectory angle and range

Initialize Extended Kalman Filter Initialize the filter with an initial state estimate at [35;
0; 45; 0].
filter = trackingEKF(State=[35; 0; 45; 0],StateCovariance=
initialCovariance, ...
StateTransitionFcn=@stateModel,ProcessNoise=processNoise, ...
MeasurementFcn=@measureModel,MeasurementNoise=measureNoise);
estimateStates(:,1) = filter.State;

Run Extended Kalman Filter And Show Results

Run the filter by recursively calling the predict and correct object functions.
for i=2:length(tspan)
predict(filter,dt);
estimateStates(:,i) = correct(filter,measurements(:,i));
end
figure(1)
plot(estimateStates(1,1),estimateStates(3,1),"g*",DisplayName="

35
Initial Estimate")
plot(estimateStates(1,:),estimateStates(3,:),"g",DisplayName="
Estimated Trajectory")
legend(Location="northwest")
title("True Trajectory vs Estimated Trajectory")

Figure 4-18 : True Trajectory vs Estimated Trajectory

Helper Functions: stateModel modeled constant velocity motion without process


noise.
function stateNext = stateModel(state,dt)
F = [1 dt 0 0;
0 1 0 0;
0 0 1 dt;
0 0 0 1];
stateNext = F*state;

36
end

meausreModel models range and azimuth angle measurements without noise.


function z = measureModel(state)
angle = atan(state(3)/state(1));
range = norm([state(1) state(3)]);
z = [angle;range];
end

37
5. EXPECTED OUTCOME

After the completion of the project;”DartVision”, it is expected that the dart will be able
to hit the target by itself with the help of face detection and target locking mechanism.
As a reference the system is closely related to automatic archery robot as shown in
figure 5-1

(a) bow targeting apple in head using (b) bow hitting apple in head using image
image detection detection

Figure 5-1 : Reference of expected outcome [10]

38
6. FEASIBILITY ANALYSIS

The feasibility of our project is assessed based on several key factors.

• Computational Resources: The DartVision system integrates YOLO for facial


detection, Kalman Filter (KF) for tracking, and stereo vision for depth analysis.
While YOLO and stereo vision can be computationally demanding, the Kalman
Filter ensures efficient tracking, optimizing overall resource usage. By utilizing
Google Colab, Kaggle, and our personal computer’s GPU, we can effectively
manage our computational requirements, making the project technically feasible
within our resource constraints.

• Data Availability and Suitability: For accurate facial detection and tracking, we
will utilize publicly available datasets such as FDDB and WIDER FACE, which
offer a vast number of labeled images. For depth analysis, stereo vision datasets
will be employed. These comprehensive datasets are sufficient for our project
needs, ensuring the availability and suitability of data are well within feasible
limits.

• Time Feasibility: The project is divided into two distinct phases. In the first
phase, we will develop the necessary mathematical concepts and derivations to
support our use of YOLO for facial detection, KF for tracking, and stereo vision
for depth analysis. In the second phase, we will implement and train the model
using the selected datasets, followed by integrating the model with servo and
stepper motors for the dart targeting system. This structured approach allows
for clear milestones and effective time management, ensuring the project can be
completed within the allocated time frame.

39
7. PROJECT SCHEDULE

Figure 7-1 : Project Schedule

40
8. ESTIMATED PROJECT BUDGET

This is the estimated budget for our project.

Table 8-1 : Budget Estimation

Name Units Unit Price


ESP32-CAM 2 2000
Stepper motor 1 1300
Servo Motor9g 2 250
3D Printing - 10000
Mechanical Parts - 5000
Miscellaneous - 3000
Total - 23800

41
9. APPENDICES

Appendix A: Mean Average Precision (mAP)

Mean Average Precision (mAP) is a metric used to evaluate object detection models.
The mean of average precision(AP) values are calculated over recall values from 0 to 1.
mAP formula is based on the Confusion Matrix, Intersection over Union(IoU), Recall
and Precision sub metrices.

Confusion matrix is created using four attributes. True Positives (TP), True Neg-
atives (TN), False Positives (FP) and False Negatives (FN). True Positives (TP) occur
when the model correctly predicts a label that matches the ground truth, while True
Negatives (TN) occur when the model correctly does not predict a label that is also
absent in the ground truth. Conversely, False Positives (FP) arise when the model in-
correctly predicts a label that is not part of the ground truth, known as a Type I Error.
False Negatives (FN) happen when the model fails to predict a label that is actually
present in the ground truth, referred to as a Type II Error.

(a) Confusion Matrix (b) Intersection over union

Figure 9-1 : Mean Average Precision Metrices

Intersection over Union (IoU) indicates the overlap of the predicted bounding box
coordinates to the ground truth box. Higher IoU indicates the predicted bounding box
coordinates closely resembles the ground truth box coordinates.

Precision measures how well true positives are found out of all positive predictions.
The precision value may vary bsaed on the model’s confidence threshold.

TP
Precision =
T P + FP

42
Recall measures how well the true positives are found out of all predictions.

TP
Recall =
T P + FN

Average Precision (AP) is determined as the weighted mean of precisions at each


threshold, where the weight is the increase in recall from the previous threshold. Mean
Average Precision (mAP) is the average of AP across all classes. The general process
to calculate AP involves generating prediction scores with the model, converting these
scores to class labels, calculating the confusion matrix (TP, FP, TN, FN), computing
precision and recall, determining the area under the precision-recall curve, and then
measuring AP. mAP is then obtained by averaging the AP values of each class.

1 N
mAP = ∑ APi (9.1)
N i=1

The equation 9.1 gives the mean Average Precision. The mAP incorporates the trade-off
between precision and recall and considers both false positives (FP) and false negatives
(FN). This property makes mAP a suitable metric for most detection applications.

43
Appendix B: Reference Design of Launching Mechanism

Dart throwing mechanism can be designed such that one servo controls the pitch angle
of the mechanism so that the dart gun releasing mouth has minimum offset value along
z-axis compared to the center of the target face.

Similarly, another servo controls the yaw angle so that the dart gun releasing mouth
has minimum offset value along x-axis compared to the center of the target face

Figure 9-2 : 3d view of pitch and yaw angle

The depth value obtained from depth sensor helps to find out the range of dart to
be thrown, thus helping to obtain initial velocity to be thrown by the mechanism as
discussed in 4-3 .

(a) Isometric view of 3d design (b) Side view of 3d design

Figure 9-3 : Reference 3d design of dart shooting mechanism [10]

44
References

[1] S. R. Rath. “Moving object detection using frame differencing with opencv.”
[Online; accessed 17-June-2024]. (2020), https://debuggercafe.com/moving-
object-detection-using-frame-differencing-with-opencv/.

[2] G. Thapa, K. Sharma, and M. K. Ghose, “Moving object detection and segmen-
tation using frame,” International Journal of Computer Applications, vol. 102,
2014.

[3] S. Singh and S. G. Jasmine, “Face recognition system,” International Journal of


Engineering Research & Technology (IJERT), vol. 8, 2019.

[4] W. Yang and Z. Jiachun, “Real-time face detection based on yolo,” in 1st IEEE
International Conference on Knowledge Innovation and Invention (ICKII), Juju
Island, Korea (South), 2018.

[5] S. K. Mondal, I. Mukhopadhyay, and S. Dutta, “Review and comparison of face


detection techniques,” in Proceedings of International Ethical Hacking Confer-
ence, Springer Nature, 2019, 3–14.

[6] A. R. M. Khairudin, Z. Baharuddin, H. Mohamed, and A. A. M. Faudzi, “De-


sign and control of an articulated robotic arm for archery,” in IEEE 5th Interna-
tional Symposium in Robotics and Manufacturing Automation (ROMA), 2022.

[7] G. Lavanya and S. D. Pande, “Enhancing real-time object detection with yolo
algorithm,” EAI Endorsed Transactions on Internet of Things, vol. 23, no. 5,
123–134, 2023. DOI: 10.4108/eetiot.4541.

[8] Q. Li, R. Li, K. Ji, and W. Dai, “Kalman filter and its application,” in Proceed-
ings of the 2015 8th International Conference on Intelligent Networks and
Intelligent Systems, Computer Technology Application Key Lab of Yunnan
Province, Kunming University of Science and Technology, Kunming, China,
650500, 2015.

[9] DrMax, Computer vision: Stereo 3d, Accessed: 2024-06-17, 2024. https://www.
baeldung.com/cs/stereo-vision-3d.

45
[10] K. Carter, Robot archer, Accessed: 2024-06-17, 2021. https : / / hackaday. io /
project/179680/gallery#5b03f7f8eac091844f78badc2680e3ac.

46

You might also like