0% found this document useful (0 votes)
24 views6 pages

Pose Estimation and Experimental Study

The document presents a study on a gesture-controlled autonomous quadcopter system that utilizes pose estimation and deep learning for improved control. It introduces a novel gesture classifier and demonstrates the system's effectiveness through real-world testing, achieving high accuracy in recognizing gestures for various flight maneuvers. The research highlights the potential for user-friendly quadcopter operation, particularly beneficial for beginners.

Uploaded by

Mohan D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views6 pages

Pose Estimation and Experimental Study

The document presents a study on a gesture-controlled autonomous quadcopter system that utilizes pose estimation and deep learning for improved control. It introduces a novel gesture classifier and demonstrates the system's effectiveness through real-world testing, achieving high accuracy in recognizing gestures for various flight maneuvers. The research highlights the potential for user-friendly quadcopter operation, particularly beneficial for beginners.

Uploaded by

Mohan D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2024 9th International Conference on Applying New Technology in Green Buildings (ATiGB)

Pose Estimation and Its Impact on Quadcopter


Control: An Experimental Study
1stMinh-Quan Nguyen 2nd Van-Ho Tran 3rd Minh-Tai Vo
2024 9th International Conference on Applying New Technology in Green Buildings (ATiGB) | 979-8-3315-0504-2/24/$31.00 ©2024 IEEE | DOI: 10.1109/ATiGB63471.2024.10717852

1Faculty of International Education Intelligent Systems Laboratory School of Science, Engineering &
2 Ho Chi Minh City University of Technology, RMIT University Vietnam
Intelligent Systems Laboratory
Ho Chi Minh City University of Technology and Education Ho Chi Minh City, Vietnam
Technology and Education Ho Chi Minh City, Vietnam [email protected]
Ho Chi Minh City, Vietnam [email protected]
[email protected]
4th My-Ha Le
Faculty of Electrical and
Electronics Engineering
Ho Chi Minh City University of
Technology and Education
Ho Chi Minh City, Vietnam
Corresponding author:
[email protected]

Abstract—Autonomous quadcopter control based on is particularly beneficial for beginners. By using gestures,
computer vision has become one of the popular research operations are simplified, eliminating the need for physical
methods. However, challenges persist due to constraints in devices and enhancing portability and convenience. Recent
processing speed and memory associated with embedded edge studies on controlling quadcopters with gestures have
computing, thereby limiting accuracy and real-time
proposed and tested various methods. In paper [5], authors
performance. In this study, we introduce a novel gesture-
controlled autonomous quadcopter system capable of achieving discuss how Radio Control (RC) has traditionally been the
high accuracy in real-time scenarios. Firstly, the Lightweight method for controlling quadcopters, but it requires significant
OpenPose method is employed to extract gesture sequences. training. The paper explores alternatives like Kinect and Leap
Secondly, we propose a gesture classifier based on the SimVP Motion sensors for more natural interactions, although they
architecture. In order to validate our approach, we conduct an have a limited range. The research introduces an innovative
evaluation using a generated dataset, followed by testing in a wearable human-quadcopter interface developed on a
real-world environment. Results of our evaluation and Raspberry Pi Zero, allowing even beginners to execute
experiments demonstrate the effectiveness of the proposed complex maneuvers such as takeoffs, landings, and various
method.
flight patterns (circle, square, spiral) using hand gestures. The
Keywords— Quadcopter control, pose estimation, embedded effectiveness of this interface is validated through Gazebo
system, gesture-controlled quadcopter. simulations and field experiments, demonstrating its
commercial viability. Study [6] introduces a wearable device
SUMPPLEMENTARY MATERIAL that uses hand gestures to control UAV navigation via a
A video attachment to this work is available at: Natural User Interface (NUI), specifically with the DJI Tello
Video experiment: https://youtu.be/ovRuzbg3Swo quadcopter. The device employs an MPU6050 sensor,
enhanced by a complementary filter technique that combines
I. INTRODUCTION accelerometer and gyroscope data to stabilize angular outputs
The applications of combining quadcopters with and reduce noise. Experimental results show that hand
computer vision technology are diverse. These include gestures can effectively command the quadcopter, proving
obstacle detection and avoidance [1], vision-based landing the feasibility of gesture-based systems for UAV control.
[2], UAV navigation [3], inspection missions, and even These studies demonstrate the potential for developing
autonomous quadcopter racing [4]. Based on the above intelligent and flexible quadcopter control systems.
studies, our research focuses on creating a gesture-controlled The main contributions of this paper are summarized as
quadcopter control system. By using gesture recognition follows: First, we propose a new action classification model
techniques and deep learning neural networks, we aim to that enhances the accuracy and efficiency of identifying and
develop an interactive interface that eliminates the need for classifying various actions. Second, we apply this
conventional control mechanisms, making the operation of classification model to control a quadcopter, demonstrating
the quadcopter more natural and intuitive. its capability to operate flexibly and robustly in various
The traditional method of controlling a quadcopter scenarios. These contributions not only advance the field of
typically involves using a remote controller, which requires action classification but also significantly improve the
the operator to possess a certain level of expertise. For operational reliability and adaptability of quadcopter
newcomers, achieving stable control can be challenging. systems. The overall scheme is depicted in Fig.1.
Gesture control offers a more user-friendly alternative, which

979-8-3315-0504-2/24/$31.00 ©2024 IEEE 264 Danang, August 30-31, 2024

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 02,2025 at 19:16:04 UTC from IEEE Xplore. Restrictions apply.
Fig. 1 The overall scheme of our proposed system.

II. SYSTEM OVERVIEW


A. Controller diagram
.

Fig. 2. System overview diagram.

In Fig. 2, the system overview diagram consists of two


main parts as follows:
(1) Ground Station Control (GSC): We use a PC to receive Fig. 3. The experimental quadcopter platform.
and collect data from the quadcopter through the Bluetooth
connection between the quadcopter and the Mission Planner Table I LIST OF MODULES IN EXPERIMENTAL QUADCOPTER PLATFORM
software.
No. Hardware details
(2) Executive Structure (ES): The Jetson Nano single-board DC-DC 24V/12V To 5V 5A Step Down Power Supply
computer is responsible for receiving images from the 1
Buck Converter
camera, performing image processing calculations, and then 2 Module Bluetooth HC-05
sending values back to the Pixhawk control circuit via the 3 Motor Sunnysky A2212-1400Kv
4 Pin Lipo 3s 5500mAh
UART interface. Four Electronic Speed Controllers (ESCs) Beitian BN-880 Flight Control GPS Module Dual Module
receive signals from the Pixhawk flight control circuit and 5
Compass
control the motors. Four brushless motors serve as the flight 6 8045 Propeller Multicopter
actuators for the quadcopter. These motors help the 7 Camera Logitech C270
8 Combo Pixhawk 2.4.8
quadcopter move and change altitude when necessary. In 9 Jetson Nano Developer Kit B01
addition to receiving calculations from the Jetson Nano, the 10 FS-iA6B Receiver
flight circuit also collects sensor data such as compass,
gyroscope, GPS, and other sensors. III. METHODOLOGY
B. Experimental Testbed A. Classification model
For this project, we chose the DJI-F450 frame combined In this subsection, we propose an efficient action
with a GPU-equipped computer, specifically a Jetson Nano classification architecture for quadcopters, inspired by the
with 4GB of RAM. However, realizing that the computational work [7], [8]. Fig. 4 illustrates the model architecture. It
needs of autonomous flight with a single computer are consists of two key components: a backbone for extracting
insufficient, we have equipped the Pixhawk flight circuit with spatial features and a translator for learning temporal
an integrated 32-bit ARM Cortex M4, improving the stability information.
of the system. The single computer communicates with the The backbone utilizes a convolutional block with 1D
Pixhawk via the standard MAVLink protocol. The power convolutional layers (Conv1d), group normalization, and
source for this project comes from a Li-Po battery divided into LeakyReLU activation. The translator employs four Simple
two parts. The first part uses a voltage reduction circuit to Inception Attention (SIA) blocks. Each SIA block
convert the voltage to 5V, powering the Jetson Nano and incorporates a multi-scale convolution module with kernel
Pixhawk. The second part uses a voltage-divider circuit to sizes of 3 (conv3x3) and 5 (conv5x5). It also utilizes a
power the motors. To enable precise navigation, we use a
Depthwise Convolution (DWConv) and a Pointwise
Logitech camera with a 55-degree field of view, which
connects to the NVIDIA Jetson Nano via a USB port. Convolution (PWConv) - that function as channel attention.
Additionally, we use the HC05 Bluetooth module to establish Depthwise separable convolution is technique that reduces
communication with the flight circuit to monitor the computational cost compared to standard convolution.
parameters of the quadcopter and data in real-time. An DWConv processes each input channel separately using a
experimental setup of the quadcopter manufactured by us is kernel size of 5. PWConv combines information from all
presented in Fig. 3. Ten modules were set up on this platform. channels using a kernel size of 1. The equations illustrate the
Information about each module is shown in Table I. computation within a single SIA block as follows:

264 Danang, August 30-31, 2024


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 02,2025 at 19:16:04 UTC from IEEE Xplore. Restrictions apply.
C. Loss function
We leverage the cross-entropy loss function to measure
the difference between a model's predictions and the ground
truth labels. The formula is as follows:
1N 
J 
N  i 1
 
  yi  log yi 



where N represents the size of the training set, yi and yi


denote the one-hot encoding ground truth label and the
model's predicted label for a sample, respectively.
D. Implementation process
The model was trained on an Nvidia Tesla T4 GPU using
Fig. 4 The architecture of the classification model.
the Adam optimizer [11]. Training employed 50 epochs with
F1 x   conv3 3 DWConv  x   early stopping, a batch size of 32, and learning rate of 1e4 .

Following training, the model will be converted to TensorRT
F 2  x   conv5 5  DWConv  x    for on-board deployment on a Jetson Nano embedded system
Attention  PWConv  F1 x   F 2  x    for quadcopter use.
Output  Attention  x.  IV. EXPERIEMENTAL RESULTS
B. Dataset for training A. Experimental findings: A Visual Overview
Data were collected from a real-world environment, In Fig.6 presents the results for each action using a
encompassing ten distinct actions, as illustrated in Fig. 5. To confusion matrix with an accuracy result of 95.3%.
improve action recognition accuracy, an input sequence was Our quadcopter has achieved the ability to recognize and
generated for every ten frames. Subsequently, the dataset was execute 10 different gestures with ten corresponding flight
divided into training, validation, and test sets with a 70:15:15 actions in a real flight scenario, as illustrated in Fig. 5. This
ratio. The number of samples for each action is summarized demonstrates the flexibility and efficiency of the gesture-
in Table II. Using the lightweight OpenPose model [9] [10], based quadcopter control system we have developed.
skeletal information was extracted from the collected videos. Gestures include actions such as arm, takeoff, forward,
This process yielded the coordinates of 18 key points on the backward, move left, move right and land which we
body, including ears, eyes, nose, neck, shoulders, elbows, experiment with from Fig. 7 to Fig. 13.
wrists, hips, knees, and ankles.

Fig. 5 Skeleton representations of human actions.

Table II DATASET DISTRIBUTION FOR ACTION CLASSIFICATION.

Actions Train set Val set Test set Total


Arm 454 112 111 677
Take off 430 90 91 611 Fig. 6. Confusion matrix.
Turn left 532 122 113 767
Turn right 548 112 125 785
Forward 432 101 90 623
Back 523 99 120 742
Land 586 116 114 816
Rotation 45 463 99 79 641
Rotation 90 565 113 124 802
Rotation 135 427 99 97 623

Fig. 7. Arm and spin motor.

265 Danang, August 30-31, 2024


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 02,2025 at 19:16:04 UTC from IEEE Xplore. Restrictions apply.
The meaning of results from Fig. 7 to Fig. 13 is described
in Table III.

TABLE III. THE MEANING OF RESULTS FROM FIG. 7 TO FIG. 13


Figure Meaning of figure
Fig. 8. Quadcopter takes off using take off command. The image depicts the quadcopter receiving an arm
Fig. 7
action, causing all four motors to rotate in place.
After activating the four arm motors, we immediately
initiate the takeoff action for five seconds to enable the
Fig. 8 quadcopter to ascend. The images are displayed in
sequential order from left to right, with the camera
positioned as illustrated in Fig. 14a
In the experiment, when the quadcopter receives the
forward action, it starts to move forward. The images
Fig. 9
Fig. 9. Quadcopter moves forward using forward command. are arranged from left to right, with the recording
camera angle as shown in Fig. 14b.
In the experiment, when the quadcopter receives the
backward action, it begins to move backwards. The
Fig. 10 images are arranged from left to right, depicting this
movement, with the recording camera angle as shown
in Fig. 14b.
In the experiment, when the quadcopter receives the
'turn right' action, it begins moving to the right. The
Fig. 10. Quadcopter moves backwards using back command. Fig. 11 sequence of images is arranged from left to right,
displaying this movement with the recording camera
angle as depicted in Fig. 14a.
In the experiment, when the quadcopter receives the
'turn left' action, it begins moving to the left. The
Fig. 12 sequence of images is arranged from left to right,
displaying this movement, with the camera angle as
depicted in Fig. 14a.
In the experiment, when the quadcopter receives the
Fig. 11. Quadcopter moves right using turn right command. 'land' action, it initiates the landing process. The
Fig. 13 sequence of images is arranged from left to right,
capturing this movement, with the camera positioned
as illustrated in Fig. 14a.

B. Experimental results from geographical perspective


Mission Planner software is an open-source tool used for
data collection and flight control, offering a user-friendly
Fig. 12. Quadcopter moves left using turn left command. graphical interface for planning, monitoring, and controlling
flight missions. In our experiment, we used this software to
perform gesture-controlled flight recording, following these
steps: First, the arm action activated all four motors to test
their stability by rotating in place. Second, we performed the
takeoff action. Third, after balancing the quadcopter, we
executed a series of operations: moving forward as shown in
Fig. 13. Quadcopter lands using land command. Fig. 15a, turning left as shown in Fig. 15b, moving backward
as shown in Fig. 15c, and turning right as shown in Fig. 15d.
Finally, we concluded with the landing action. Fig. 15e shows
the position of the quadcopter after completing all seven
control movements.
We use two PID controllers for the altitude and position
of the quadcopter, with the parameters shown in Table IV. In
this experiment, we perform flying actions in the following
sequence: arm, takeoff, forward, left, backward, right, and
landing. The observations are made from the map’s
perspective, as shown in Fig. 15. From Fig. 16 to Fig. 19 are
graphs describing the system's response from four
perspectives: roll, pitch, yaw, and altitude, used to evaluate
the system.
TABLE IV. PID CONTROL PARAMETERS

PID controller Control parameters


KP1 = 0.5
PID control of height Z KI1 = 1
KD1 = 0
a) b) KP2 = 0.135
Fig. 14 a) The camera position is used to record images when the drone PID control of position X-Y KI2 = 0.135
moves left and right. b) The camera position is used to record images when KD2 =0.0036
the drone moves forward and backward.

266 Danang, August 30-31, 2024


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 02,2025 at 19:16:04 UTC from IEEE Xplore. Restrictions apply.
a) b)

Fig. 17. Desired and response signals of pitch angle.

c) d) According to Fig. 17, both the desired pitch angle (blue


line) and the actual pitch angle (red line) show small
fluctuations around 0 degrees in the first 15 seconds.
However, at the 18th second, the pitch angle becomes
negative, indicating a forward motion of the quadcopter. At
the 32nd second, the pitch angle becomes positive, indicating
a backward motion. In summary, the graph shows that while
the quadcopter's pitch angle generally follows the desired
signal, some errors remain.

e)
Fig. 15 a) Quadcopter moves forward. b) Quadcopter moves left. c)
Quadcopter moves backward. d) Quadcopter moves right e) Quadcopter
returns to position completing all seven control commands.

Fig. 18. Desired and response signals of roll angle.

According to Fig. 18, during the first 23 seconds, both the


desired roll angle (blue line) and actual roll angle (red line)
exhibit minor fluctuations around zero degrees. However, at
Fig. 16. Desired and response signals of altitude.
approximately 27 seconds, the roll angle displays a negative
value, indicating a leftward movement, and at 46 seconds, the
According to Fig. 16, demonstrates that within the first 5 roll angle shows a positive value, suggesting a rightward
seconds, the quadcopter starts ascending towards the desired movement. In summary, the graph demonstrates that the
altitude of 1.6 meters. However, during the initial takeoff, it quadcopter's roll angle generally tracks the desired signal
overshoots, reaching approximately 2 meters. From the 8th consistently throughout the duration.
second onward, the quadcopter begins to align more According to Fig. 19, the red curve shows significant
accurately with the desired altitude. Approximately 50 fluctuations in the yaw angle in the first 15 seconds,
seconds later, the quadcopter initiates its landing sequence. indicating that the quadcopter rotated 20 degrees during
takeoff from its initial position. The desired yaw angle (blue

267 Danang, August 30-31, 2024


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 02,2025 at 19:16:04 UTC from IEEE Xplore. Restrictions apply.
line) remains stable at about 70 degrees, indicating that the Another aspect to consider is the gesture control method.
target direction is consistently established and maintained. Gesture control can be integrated with flight trajectory
However, after 15 seconds, the actual yaw angle (red line) planning instead of relying on the current basic movements.
begins to stabilize and adhere more closely to the desired yaw For instance, for internal deliveries within a school or
angle. There are small deviations and fluctuations around the residential area, the user would only need to perform one
desired yaw angle, but the magnitude of these deviations gesture, and the quadcopter would automatically follow a
gradually decreases over time, indicating improved stability predefined trajectory from area A to area B. This approach
and control. In summary, the graph shows that although the promises to be a highly practical and innovative application.
quadcopter experiences some yaw angle instability during the ACKNOWLEDGMENT
initial takeoff phase, it eventually stabilizes and adheres quite
well to the established desired yaw angle after the first 15 This article belongs to the project SV2024-99, funded by
Ho Chi Minh City University of Technology and Education
seconds.
(HCMUTE). We would like to thank you for this support.
REFERENCES
[1] Z. Xue, T. Gonsalves, "Vision Based Drone Obstacle Avoidance by
Deep Reinforcement Learning," AI, vol. 2, pp. 366-380, 2021.
[2] L. -A. Tran, N. -P. Le, T. -D. Do and M. -H. Le, "A Vision-based
Method for Autonomous Landing on a Target with a Quadcopter," in
4th International Conference on Green Technology and Sustainable
Development (GTSD), Ho Chi Minh City, Vietnam, 2018.
[3] M. Y. Arafat, M. M. Alam, S. Moh, "Vision-Based Navigation
Techniques for Unmanned Aerial Vehicles: Review and Challenges,"
Drones 7, vol. 7, p. 89, 2023.
[4] E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun and
D. Scaramuzza, "Champion-level drone racing using deep
reinforcement learning," Nature 620, vol. 2, p. 982–987, 2023.
[5] S. -Y. Shin, Y. -W. Kang and Y. -G. Kim, "Hand Gesture-based
Wearable Human-Drone Interface for Intuitive Movement Control,"
in 2019 IEEE International Conference on Consumer Electronics
Fig. 19. Desired and response signals of Yaw angle. (ICCE), Las Vegas, NV, USA, 2019.
[6] A. Budiyanto, M. I. Ramadhan, I. Burhanudin, H. H. Triharminto and
V. CONCLUSION AND FINAL REMARKS B. Santoso, "Navigation control of Drone using Hand Gesture based
on Complementary Filter Algorithm," Journal of Physics: Conference
In conclusion, the experiments on the quadcopter have Series, 2021.
been conducted, analyzed, and discussed in this paper. This [7] Z. Gao, C. Tan, L. Wu and S. Z. Li, "SimVP: Simpler yet Better Video
research made significant progress in developing a quadcopter Prediction," in 2022 IEEE/CVF Conference on Computer Vision and
control system using gestures and a new classification model Pattern Recognition (CVPR), New Orleans, LA, USA, 2022.
architecture. By incorporating the Jetson Nano for real-time [8] M. -H. Guo, C. Lu, Q. Hou, Z. Liu, M. -M. Cheng and S. Hu,
quadcopter operations, the system demonstrated a promising "SegNeXt: Rethinking Convolutional Attention Design for Semantic
high accuracy rate surpassing 95%. Segmentation," ArXiv, vol. abs/2209.08575, 2022.
[9] D. Osokin, "Real-time 2D Multi-Person Pose Estimation on CPU:
However, our project still has a limitation regarding the Lightweight OpenPose," in arXiv preprint arXiv:1811.12004, 2018.
camera in the experimental protocol. Our experimental setup [10] Z. Cao, G. Hidalgo, T. Simon, S. -E. Wei and Y. Sheikh, "OpenPose:
uses an inexpensive cameras, which results in limited image Realtime Multi-Person 2D Pose Estimation Using Part Affinity
quality and visibility. In future work, we aim to upgrade to Fields," IEEE Transactions on Pattern Analysis and Machine
higher-quality cameras to enhance the quadcopter's operation Intelligence, vol. 43, pp. 172-186, 2021.
capabilities. Additionally, our next focal point will be fine- [11] D. P. Kingma, J. Ba, " Adam: A method for stochastic optimization,"
tuning the sensor system to minimize drift when the in arXiv preprint arXiv:1412.6980, 2014.
quadcopter is in a stationary position, awaiting further actions.

268 Danang, August 30-31, 2024


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 02,2025 at 19:16:04 UTC from IEEE Xplore. Restrictions apply.

You might also like