Apprentissage Automatique pour Drones
Apprentissage Automatique pour Drones
UAV-Assisted Networks
Techniques d’Apprentissage Automatique Pour Les
Réseaux Assistés Par Drone
Arzhang SHAHBAZI
THESE DE DOCTORAT
Composition du jury
Membres du jury avec voix délibérative
NNT : 2022UPASG076
1 Introduction 7
1.1 Next Generation Strategies for UAV Communication Mobile Networks . . . . . . . . . . . 9
1.2 Machine Learning and Artificial Intelligence for UAV Networks Beyond 5G . . . . . . . . . 10
1.3 Thesis Overview and Major Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5
4 Throughput Maximization with Learning Based Trajectory for Mobile Users 41
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Learning Based Trajectory Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Federated Reinforcement Learning UAV Trajectory Design for Fast Localization of Ground
Users 53
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 Federated learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7 Conclusion 85
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2.1 Machine Learning-aided Wireless Networks . . . . . . . . . . . . . . . . . . . . . . 88
7.2.2 Federated Learning in Future Networks . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2.3 Machine Learning for Reconfigurable Intelligent Surfaces . . . . . . . . . . . . . . 91
6
1 - Introduction
7
This chapter begins with section 1.1 which introduces overview of potential
enablers in the next generation UAV communication networks alongside with the
corresponding research challenges. Section 1.3 highlights the major contributions
in this thesis work and the organization of the thesis. Section 1.4 provides lists of
publications produced during my Ph.D. candidature.
Sommaire
1.4 Publications . . . . . . . . . . . . . . . . . . . . 12
8
1.1 . Next Generation Strategies for UAV Communication Mo-
bile Networks
In recent years, rapid progress have been done in the design and the impro-
vement of Unmanned Aerial Vehicles (UAVs) of different sizes, shapes, and their
communication capacities. Drones can move autonomously by attached micropro-
cessors or can be operated from a far without taking any human personnel. Due to
their adaptability, easy installation, low maintenance costs, versatility, and relatively
small operating cost, the use of drones support new ways for commercial,military,
civilian, agricultural, and environmental applications such as border surveillance,
relay for ad hoc networks, managing wildfire, disaster monitoring, wind estima-
tion, traffic monitoring, remote sensing, and search and destroy operations. Many
of these applications needs a single UAV system and others like area monitoring
for hazardous environments demand multi-UAVs systems. Although single drone
systems are utilized for decades, by functioning and developing one large UAV,
exploiting a set of small UAVs has many advantages. Each UAV acts as an iso-
lated node in the single UAV systems, it can only communicate with the ground
node. Consequently, the UAV communication system is established through only on
UAV-to-infrastructure communication, and the communication between the UAVs
can be based on the infrastructure. The capacity of a single UAV system is res-
tricted compared to the multi UAV system which has many advantages. First and
foremost, tasks are principally completed at a lower cost with multi UAV systems.
Additionally, the collaborative work of UAVs can enhance the performance of the
system. Moreover, if the UAV fails in a mission in a multi UAV system, the ope-
ration can continue to exist with the other UAVs, and tasks are generally finished
more swiftly and efficiently with multi-UAV systems.
Multiple UAVs can be utilized for successful and efficient mission completion
due to their capabilities, flight time, and limited payload. To enable cooperation,
communication and networking are essential to organize multiple UAVs and achieve
autonomous drones network. Also, Ad hoc networks from multiple UAVs can be
an possible communication approach. In the ad hoc UAVs network, only some
drones are connected to the ground base, but all of the drones structure an ad hoc
network. In these systems, UAVs are able to communicate with other UAVs and
the ground base. Ad hoc UAVs networks can be considered as a special structure
of Mobile Ad-hoc Network (MANET), and Vehicular Ad-hoc Network (VANET).
In fact, UAVs networks have some distinguished characteristics when compared
to the existing Ad hoc networks. Nodes in the UAVs networks are characterized
by their high mobility degree. VANET and MANET nodes are cars and walking
men respectively and UAVs fly in the sky above them. The high mobility of UAVs
impact the network topology that changes more frequently in comparison with
the topology of the MANET or the VANET. Furthermore, MANET and VANET
task is to create peer-to-peer connections. Drones network also require peer-to-
peer connections to guarantee coordination and collaboration between UAVs. In
9
most cases, drones collect data and relay it to the ground station. Consequently,
it is mandatory to make sure the UAV-to- UAV communication and the UAV-
to-Infrastructure communication are functioning. Therefore, UAVs network should
establish peer-to-peer communication and converge cast traffic at the same time.
Moreover, distances between UAVs are much longer than nodes in the MANETs
and VANETs. Thus, in a attempt to create stable communication links between
UAVs, it is necessary to boost their communication range.
One of the most important design problems of multi-UAV systems is the com-
munication, which is essential for coordination and collaboration between the UAVs.
UAVs can be utilized in aerial sensor networks in which they are composed of mul-
tiple data sources assigned in different zones where UAVs nodes are used to gather
information. It may contain different types of sensors, and each sensor may requi-
red different data delivery methods. If there is a need to use different sensors, they
will be loaded on different UAVs, e.g., one UAV can be loaded with an infrared
camera, while another UAV is equipped with a high-resolution camera. Further-
more, UAVs network have various challenging system parameters such as limited
bandwidth, high mobility, irregular connectivity, restricted transmission range, and
uncertain noisy channels. These challenges introduce different issues in the ad hoc
multihop environment like collisions, and transmission delays. For example, it is
very demanding to maintain the transmission range between two UAVs moving
in opposite directions with very high velocity. Due to the aforementioned issues,
more studies and deep investigation related to UAV communication systems are
necessary. Subsequently, among the objectives of this thesis is to recognize the
challenges and design characteristics and constraints of the UAVs networks. Fur-
thermore, we investigate the fundamental needs and functions for communication
in UAV-based systems, and we propose various solutions that can be utilized for
UAV communication systems.
10
future network states, so allowing drones to adjust to the dynamics and randomness
of the network in an online manner. Specifically, ML approaches permit drones to
generalize their observations to hidden network states and can scale to large-sized
networks, which thus makes them suitable for drone applications. Furthermore,
for such UAV-based applications, energy efficiency and computation capacity are
major design restrictions. As a result, the main scope of this thesis is to point out
the advantages that AI brings for cellular-connected UAVs under various system
configurations.
An important aspect of UAV systems is to maintain reliable cellular connecti-
vity for the UAVs at each time instant along their corresponding trajectory while
also minimizing the time required to carry out their objective. For instance, a de-
livery UAV must maintain a minimum signal-to-noise (SNR) ratio along its path
to secure a reliable communication link for its control information. This generally
depends on the UAV’s location, cell association, transmit power, and the location
of the serving ground users. For this fact, a key challenge for a UAV system is
to optimize the UAVs’ path planing so as to decrease their total delivery time
while maintaining reliable wireless connectivity and consequently an instantaneous
SNR threshold value. Even though a centralized approach can update the trajec-
tory plan of each UAV, this would necessitate real-time tracking of the UAVs and
control signals to be transmitted to the UAVs for all time instants. Furthermore, a
centralized approach earn high round-trip latency and needs a central unit to ob-
tain full knowledge of the current network state. For overcoming these challenges,
one can implement online edge algorithms that must be individually run by each
UAV to plan their corresponding future paths. In this respect, convolutional neural
networks (CNNs) can be combined with a deep reinforcement learning (RL) algo-
rithm based on a recurrent neural network (RNN) at the UAV level, resulting in a
CNN-RNN techniques. The aforementioned algorithms exhibits dynamic temporal
behavior and is characterized by its adaptive memory, which empower it to collect
necessary previous state information to estimate the future steps of each UAV. In
the meantime, CNNs are mostly used for image recognition and consequently can
be used for identifying the UAV’s environment by extracting features from input
images. For example, CNNs help drones in identifying the location of ground base
stations, ground users, and other drones in the network. These extracted features
are then fed to a deep RNN, which can be trained to learn an optimized sequence
of the UAV’s future steps that would minimize its mission time and maintain a
reliable cellular coverage during mission time based on the input features.
In the present thesis, motivated by the above stated research challenges for the
upcoming 5G and beyond 5G, we investigate the performance evaluation of Un-
manned Aerial Vehicle (UAV) communication networks by using Machine Learning
11
(ML) methods. In particular, we tackle the problem of UAV path planing while op-
timizing various system parameters. In particular we utilize Reinforcement Learning
(RL) for finding the trajectory that can achieve the specific system objectives. The
main contributions of this thesis are as follows :
— This thesis provides the detailed introduction on the use of UAVs in wireless
networks. We investigate the main use cases of UAVs and explore the key
challenges and applications. Moreover, this thesis explores in detail a novel
research approach where ML methods applied to improve the performance
of UAV networks. We provide an overview of RL and fundamentals of
Federated Learning (FL).
— This thesis introduces a framework which is based on the likelihood of
mobile users presence in a grid with respect to their probability distribu-
tion. We model a novel UAV-assisted communication system depending on
the shortest flight path of the UAV while maximizing the amount of data
transmitted to mobile devices. The approach we use is deep reinforcement
learning technique for finding the trajectory to maximize the throughput
for ground mobile users. Numerical results highlight how our method strike
a balance between the throughput achieved, trajectory, and the complexity.
— This thesis propose an approach for localizing ground targets by using Re-
ceived Signal Strength (RSS) and utilizing UAVs as aerial anchors. We
introduce a new framework based on FL that includes multiple UAVs trai-
ning in different environments settings for finding the optimal path which
results in faster convergence of the RL model for minimum localization
error.
— In this thesis, we explore the Dual-Functional Radar Communication (DFRC)
in UAV networks where a single UAV serves a group of communication users
and locate the ground targets simultaneously. To balance the communica-
tion and localization performance, we solve multi-objective optimization
problem to jointly optimize communication system throughput and loca-
lization error over a particular mission duration that is limited by UAV’s
energy consumption and flying time. For this purpose, we introduce a new
framework based on (RL) to allow the UAV to autonomously optimize its
path which results in improving the localization accuracy and maximizing
the number of transmitted bits.
1.4 . Publications
12
Network : A Reinforcement Learning Approach" Under Submission
Abstract :In this paper, we explore the optimal trajectory for maxi-
mizing communication throughput and minimizing localization error in
a Dual-Functional Radar Communication (DFRC) in unmanned aerial
vehicle (UAV) network where a single UAV serves a group of commu-
nication users and locate the ground targets simultaneously. To ba-
lance the communication and localization performance, we formulate a
multi-objective optimization problem to jointly optimize two objectives :
maximization of number of transmitted bits sent to users and minimi-
zation of localization error for ground targets over a particular mission
period which is restricted by UAV’s energy consumption or flying time.
These two objectives are in conflict with each other partly and weight
parameters are given to describe associated importance. Hence, in this
context, we propose a novel framework based on reinforcement lear-
ning (RL) to enable the UAV to autonomously find its trajectory that
results in improving the localization accuracy and maximizing the num-
ber of transmitted bits in shortest time with respect to UAV’s energy
consumption. We demonstrate that the proposed method improves the
average transmitted bits significantly, as well as the localization error
of the network.
— Conference Papers The following is a list of publications in refereed confe-
rence proceedings that originated from the main findings of this thesis. The
conference papers [1] contain material not presented in this thesis.
— (C1) Arzhang Shahbazi and Marco Di Renzo. "Analysis of Optimal
Altitude for UAV Cellular Communication in Presence of Blockage."
2021 IEEE 4th 5G World Forum (5GWF). IEEE, 2021.
13
works." 2021 IEEE 4th 5G World Forum (5GWF). IEEE, 2021.
14
2 - UAV for Next Generation of Cellular Com-
munication - An Introduction
15
The use of flying platforms such as UAVs, popularly known as drones, is rapidly
growing. In order to paint a clear picture on how UAVs can indeed be used as
flying wireless base stations, in this chapter, we provide a comprehensive study on
the use of UAVs in wireless networks. Specifically, with their inherent attributes
such as mobility, flexibility, and adaptive altitude, UAVs grant several key potential
applications in wireless systems. UAVs can be utilized as aerial base stations to
enhance coverage, capacity, reliability, and energy efficiency of wireless networks.
They also can operate as flying mobile terminals within a cellular network. Such
cellular-connected UAVs can allow various applications ranging from real-time video
streaming to item delivery. We study the main use cases of UAVs as aerial base
stations and cellular-connected users. For each of the applications, we explore key
challenges and fundamental problems.
Sommaire
2.1 UAV Aerial Base Station in 5G and Beyond 17
2.2 Conclusion . . . . . . . . . . . . . . . . . . . . . 25
16
2.1 . UAV Aerial Base Station in 5G and Beyond
17
Core network
Macro BS
Hotspot region
Hotspot regions
mmW and potentially massive multiple input multiple output (MIMO) techniques
can set up a whole new sort of dynamic, flying cellular network for providing high
capacity wireless services, if well prepared and managed.
UAVs can also reinforce different terrestrial networks such as D2D and vehicular
networks. For example, due to their mobility and LoS communications, drones can
ease the rapid information dissemination amidst ground devices. Moreover, drones
can possibly improve the reliability of wireless links in D2D and vehicle-to-vehicle
(V2V) communications by exploiting transmit diversity. Specially, flying drones can
aid in broadcasting common information to ground users, consequently decreasing
the interference in ground networks by reducing the number of transmissions bet-
ween users. Furthermore, UAV base stations can utilize air-to-air links to service
other cellular-connected UAV-UEs, to mitigate the load on the terrestrial network.
For the preceding cellular networking schemes, it is evident that the use of UAVs is
quite logical because to their key features given in Tables III and IV such as agility,
mobility, flexibility, and adaptive altitude. In fact, with benefiting from these unique
features as well as establishing LoS communication links, UAVs can enhance the
performance of existing ground wireless networks in terms of coverage, capacity,
delay, and overall quality-of-service. These scenarios are certainly promising and
one can see UAVs as being an integral part of beyond 5G cellular networks, as the
technology blooms further, and new practical scenarios appear.
18
during public safety operations. In public safety scenarios, a reliable communica-
tion system will not only help to improve connectivity, but also it saves human lives.
Correspondingly, FirstNet in the United States was set up to build a nationwide
and high-speed broadband wireless network for public safety communications. The
potential broadband wireless technologies for public safety cases involve 4G long
term evolution (LTE), WiFi, satellite communications, and dedicated public safety
systems such as TETRA and APCO25 [5]. Nonetheless, these technologies may
not supply resilience, low-latency services, and swift adaptation to the environment
during natural disasters. Thus, utilizing UAV-based aerial networks is a promising
solution to facilitate fast, adaptive, and reliable wireless communications in pu-
blic safety scenarios. Since UAVs do not demand highly constrained and expensive
infrastructure (e.g., cables), they can effortlessly fly and adaptively change their
positions to supply on-demand communications to ground users in emergency situa-
tions. Moreover, because of the unique features of UAVs such as mobility, flexible
deployment, and rapid reconfiguration, they can establish on-demand public safety
communication networks effectively. For example, UAVs can be expanded as mo-
bile aerial base stations in order to deliver broadband connectivity to areas with
damaged terrestrial wireless infrastructure. Furthermore, flying UAVs can repea-
tedly maneuver to bring full coverage to a given area within a minimum possible
time. Thus, utilizing UAV-mounted base stations can be an suitable approach for
supplying fast and ubiquitous connectivity in public safety scenarios.
19
devices. On the other hand, mobile UAVs can bring opportunities for transmit di-
versity, consequently increasing reliability and connectivity in D2D, ad-hoc, and
V2V networks. One of the practical approaches for keeping such UAV-assisted
terrestrial networks is to leverage clustering of ground users. Thus, a drone can
perform inside each of the clusters by directly communicate with the head of the
clusters and the multi-hop communications. Here, by applying efficient clustering
approaches and exploiting drones mobility, the connectivity of terrestrial networks
can be substantially improved.
20
forming. Drones can also be a key enabler for mmW communications. On the
one hand, to establish LoS connections to ground users, the drones equipped with
mmW capabilities can decrease propagation loss while operating at high frequen-
cies. On the other hand, one can exploit advanced MIMO approaches such as
massive MIMO in order to operate mmW communications by utilizing small-size
antennas (at mmW frequencies) on drones. Meanwhile, to create reconfigurable
antenna arrays in the sky, swarms of UAVs can be used.
21
enhanced by taking advantage of unique features of drones.
It has been illustrated that Caching at small base stations (SBSs) is a pro-
mising approach to improve the communication system throughput and to reduce
the transmission delay. However, it may noth be effective to cache at traditio-
nal static ground base stations for covering mobile users in the case of recurrent
handovers [9]. For this reason, when a user moves to a new cell, its correspon-
ding demanded content may not be available at the new base station and, thus,
the users may not achieve a proper coverage. To effectively give service to mo-
bile users in these cases, each demanded content needs to be cached at different
base stations which is not practical due to the signaling overheads and additional
storage usages. Consequently, to increase the caching efficiency, it is mandatory
to scatter flexible base stations that can track the users’ mobility and effectively
transmit the demanded contents. In consequence, one can foresee futuristic scena-
rios in which UAVs, operating as flying base stations, can dynamically cache the
popular contents, track the mobility pattern of the matching users and, afterwards,
effectively serve them. In fact, using cache-enabled drones for the case of traffic
offloading in wireless networks is a promising method.
22
2.1.7 . Cellular-Connected Drones as User Equipments
In general, UAVs can operate as users of the wireless infrastructure. Specifically,
drone-users can be surveillance, utilized for package delivery, remote sensing, and
virtual reality applications. In fact, cellular-connected drones is envisioned to be a
key enabler of the IoT. One of the recent applications for delivery-based drones is
the Amazon’s prime air drone delivery service, and autonomous delivery of emer-
gency drugs. The major benefit of drone-users is their capability to quickly move
and optimize their path to complete their objectives. To properly use UAVs as user
equipments such as cellular connected drone-UEs, it is necessary to have reliable
and low-latency communication between UAVs and ground BSs [10]. Indeed, to aid
a large-scale deployment of UAVs, a reliable wireless communication infrastructure
is necessary to efficiently control the drones’ movement while supporting the traf-
fic emerging from their application services. In addition to their need for ultra low
latency and reliability, when used for surveillance purposes, drone-UEs will need a
high-speed uplink connectivity from the terrestrial network and from other UAV-
BSs. For this reason, modern cellular networks may not be able to fully incorporate
drone-UEs as they were planned for ground users whose operations, mobility, and
traffic characteristics are considerably varied from the drone-UEs. It should be no-
ted t hat there are a numerous key differences between drone-UEs and terrestrial
users. Firstly, drone-UEs usually encounter different channel conditions because of
nearly LoS communications between ground BSs and flying UAVs. Thus, in this
scenario, one of the major challenges for incorporating drone-UEs is significant LoS
interference originated by ground BSs. Secondly, in contrast to terrestrial users, the
on-board energy of drone-UEs is highly restricted. Thirdly, drone-UEs are in prin-
ciple more dynamic than ground users as they are able to continuously fly in any
angle. Consequently, supporting cellular-connected drone-UEs in wireless networks
will establish novel technical challenges and design difficulties.
23
— The installation and maintenance cost of small drones is lower than the
cost of a large drone with complex hardware and heavy payload.
— In FANETs, if one drone is out of service (due to weather conditions or
any shortcomings in the drone system), FANET missions can still carried
on with the rest of flying drones. This kind of flexibility is not included in
a single drone system.
24
geometrical statistics of various environments offered by the International Tele-
communication Union (ITU-R). Specifically, for various types of environments, the
ITU-R provides some environmental-dependent parameters to determine the den-
sity, number, and hight of the buildings (or obstacles). For example, the buildings’
heights can be modeled using a Rayleigh distribution as [3] :
hB −hB
f (hB ) = 2
exp (2.1)
λ 2λ2
1
PLoS = (2.2)
1 + C exp −B[θ − C
where C and B are constant values that depend on the environment (rural, urban,
dense urban, or others) and θ is the elevation angle in degrees. Clearly, θ = 180
π ×
sin 1( d ), with h being the UAV’s altitude, and d is the distance between the UAV
− h
and a given ground user. For this scenarios, the NLoS probability will be PN LoS =
1 − PLoS . We note that the probabilistic path loss model in (2.2) is an example of
existing A2G channel models such as the one proposed by the 3GPP [74]. Equation
(2.2) captures the fact that the probability of having LoS connection between
the aerial base station and ground users is an increasing function of elevation
angle. According to this equation, by increasing the elevation angle between the
receiver and the transmitter, the blockage effect decreases and the communication
link becomes more LoS. It is worth noting that the small-scale fading in A2G
communications can be characterized by Rician fading channel model. The Rician
K-factor that represents the strength of LoS component is a function of elevation
angle and the UAV’s altitude.
2.2 . Conclusion
25
26
3 - Machine Learning for UAV-Enabled Wi-
reless Networks
27
In this chapter, motivated by a wide set of new applications that can gain
assistance from drone networks, such as smart cities and aerial base stations de-
ployment, we cover in detail the new research directions when ML techniques are
utilized to increase the performance of UAV networks. Recently, AI is growing ra-
pidly and has been very successful, specifically due to the massive amount of the
available data. As a result, a significant part of the research community has started
to integrate intelligence at the core of UAVs networks by applying AI algorithms
in solving several problems in relation to drones. In this chapter, we start by pre-
paring an extensive overview of unsupervised and supervised ML techniques. Then
we introduce RL in details, that have been broadly applied in UAV networks. Then,
we discuss FL principles and advantages and where a FL approach can be used in
the field of UAV networks.
Sommaire
3.1 Machine Learning for UAVs : An Introduction 29
3.3.2 Q-Learning . . . . . . . . . . . . . . . . . . . 35
3.3.6 Limitation of RL . . . . . . . . . . . . . . . . 36
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . 40
28
3.1 . Machine Learning for UAVs : An Introduction
The future of UAVs are envisioned as one of the promising technologies for
the next-generation wireless communication networks. Their mobility and their
capability to maintain LoS links with the ground users made them as a key solution
for many potential applications. similarly, artificial intelligence (AI) is expanding
swiftly for the past decade and has been very successful, especially because of
the massive amount of the available data. Therefore, an important aspect of the
research community has been initiated to incorporate intelligence at the core of
drone networks by applying AI algorithms in solving various problems in relation to
UAVs.
In summary, AI is one of the trending sectors that brings intelligence to ma-
chines and makes them capable to complete objectives even better than a human
can do. It is envisioned that bringing together the advantages of using AI within
drone networks is a challenging and fascinating idea at the same time. Despite
the fact that conventional approaches illustrated a major success in solving various
problems in this sector, it is still interesting to study whether ML can contribute to
more powerful and accurate solutions. It is worth opting for AI-assisted approaches
given the unprecedented success realized by ML especially in decision-making pro-
blems, even when moving from classical methods to intelligent approaches needs
sacrificing interpretability and tractability in some scenarios.
Nonetheless, the research community believes that intelligent approaches are
not always guaranteed to outperform classical methods, instead, classical approaches
might propose simple and powerful solutions in some cases. In fact, this duality is
a proof that investigating the use of AI for the set of specific problems related to
UAV networks is worth pursuing. In the past, UAVs were studied originally to be
controlled fully manually by a person, however, with the recent evolution of AI, it
became a trend to prepare smart drones in the markets. In light of this, AI can
utilize the data accumulated by drone sensors to execute varied tasks. Also, AI is
able to play an crucial role in resource management for drones to increase energy
efficiency. The design of drones path planing and positioning are also subject to AI
advancement by equipping the drone with the capability to dodge obstacles and
design its path automatically. For example, in recent years, the drones that can
follow users have seen a huge success in the markets. This kind of UAV provides
high quality video footage by following and filming its owner while equipped with
dynamic and intelligent obstacle avoidance and target tracking algorithms. Fur-
thermore, comprehensive applications can be modernized in this context such as
traffic management, surveillance, and landing site estimation. Imaging can also be
enhanced for drones by applying the existing state of the art techniques related to
computer vision for drones imaging.
In summary, it is the subgroup of AI that set up a computer to perform tasks
accurately based on the experience gained by learning from some previous trials.
Indeed, ML has been very advantageous over the last decade due to the large
29
Artificial Intelligence
Machine Learning
Reinforcement
Learning
Deep Learning
Supervised
Unsupervised
Learning
Learning
available amount of data and powerful computers that are were not accessible
before. For this reason, research is now directed towards applying ML in drone-based
problems. The field of ML is split into different categories of problems, for example,
it can be divided to supervised learning problems, unsupervised learning problems,
and RL-based problems. In the following, we separate between the supervised and
the unsupervised learning and focus our attention to unsupervised learning and
specifically RL.
The areas of ML can be divided into various categories of problems, for example,
it might be divided as shown in Fig. 3.1 to supervised learning problems, unsuper-
vised learning problems, and RL-based problems. In the following, we distinguish
between the supervised and the unsupervised learning and discuss advantages and
limitations on each of them.
30
Some Supervised Algorithms and NN Architectures :
— Combined Classification and Regression Algorithms : There are several su-
pervised algorithms that can be utilized either for classification or regres-
sion. For example, Support Vector Machine (SVM) can do both the tasks,
decision trees also can be formulated to solve regression or classification
depending on the use case.
— Regression Algorithms : There exist algorithms that carry out pure regres-
sion objectives by predicting continuous value output. For example, we can
point out two classical algorithms in ML which are linear regression and
logistic regression.
— Classification Algorithms : It makes sense to talk about pure classifiers in
ML. Although it is mentioned in some references that Naive Bayes classifier
with “some modification” can be used for regression, we present it as a pure
classifier example since it was derived initially for classification based on
the probabilistic Bayes theorem.
— Multi Layer Perceptron (MLP) : To imitate the biological human neural
networks, ANNs are mathematically formulated for ML. ANNs are built with
several partially-connected nodes denoted by perceptrons and grouped into
different layers. Each perceptron is responsible for processing information
from its input and delivering an output. Also, MLP is the simplest form of
an ANN that consists of one input layer, one or more hidden layers, and
an output layer where a classification or regression task is accomplished.
— Convolutional Neural Networks (CNNs) : CNN is another type of ANN de-
signed initially for computer vision tasks. A CNN usually takes an image
as an input, assigns learnable weights and biases that are updated accor-
ding to a specific algorithm. The CNN architecture is characterized by the
convolutional layers which extract high-level features from the image that
will be used later. In a typical CNN architecture a feature extraction is
achieved in the first convolutional layers and classification is achieved via
a fully connected layer.
— Recurrent Neural Networks (RNNs) : When the data is sequential in nature,
RNNs will be used to solve the problem. For example, we can consider a
text speech, a video, or a sound recording. RNNs are widely used in natural
language processing (NLP), in speech recognition, and for generating image
description automatically. The RNN architecture is similar to a regular
neural network, only it contains a loop that allows the model to carry out
results from previous neurons. RNN in its simplest form is composed of
an output containing the prediction and a hidden state that represents the
short-term memory of the system.
31
the data and uncover it. For example, clustering the data, reducing data dimensio-
nality, and data generation are considered typical tasks for unsupervised learning.
In the following, we provide some classical unsupervised algorithms. Unsupervised
Algorithms and NN Architectures :
— Clustering Algorithms : There are handful of popular clustering algorithm in
ML. Here, we only mention K-means, Gaussian Mixture Modeling (GMM),DBSCAN,
and agglomerate Clustering. Some of these algorithms are density-based al-
gorithms such as DBSCAN, and others carry out hard association such as
K-means. It should be noted that the GMM is a probabilistic model that
uses soft association rule.
— Dimensionality Reduction Algorithms : Dimensionality reduction is a com-
mon method in ML consisting of transforming data from a high-dimensional
space representation to a lower-dimensional space. In this context, we men-
tion some spectral-based techniques such as autoencoders (AEs) which are
a type of neural networks used to learn a representation of the data and
encode it. Particularly, the architecture of an AE is remarkably simple. Also,
we can mention another spectral-based algorithm which is principal com-
ponent analysis (PCA) as a popular dimensionality reduction technique.
— Generative Adversarial Networks (GANs) : GANs are algorithmic architec-
tures that use two neural networks in order to generate new, synthetic
instances of data that can pass for real data. They are used widely in
image generation, video generation, and voice generation.
32
cover and discuss this technique in the following sections and chapters. In addition
to the hardware and software restriction of drones mentioned above, the practical
use of ML in UAV networks still faces other significant barriers related to existing
rules and regulations. Although research is designed at partially or even fully auto-
nomous UAV applications, most existing regulations do not allow such operations
in practice. For instance, the U.S. Federal Aviation Administration (FAA), in its
latest regulation did not lay out a single point concerning autonomous UAVs. They
rather focused on regulations dedicated to the human operators who control a
drone. However, it is important to mention that there is still a strong anticipation
for autonomous UAVs to see the light of day. In fact, unlike the FAA, the Eu-
ropean Aviation Safety Agency (EASA), in its latest regulation acknowledge the
existence of autonomous drone operations by including them and classifying them
in various classes according to the risk level of the application. Without doubt, this
will propose new opportunities for innovative UAV solutions based on ML and AI
in principle. In summary, it is vital to harmonize and unify the drone regulations
around the world, as this will motivate future research in this area.
The aim is to choose correct actions (or policy) that maximizes a predefined
reward function, which should be suitable to the type of RL problem. In addition
to the 5 elements of RL mentioned above, another element can be expressed in
some scenarios, which is the model. Conditional to its presence or not, RL problems
can be branched into two main categories which are the model-based RL and the
model-free RL. In the following, we differentiate between these two areas.
The model-based RL problem utilize a model as the sixth element to resemble
33
Agent
State 𝑺𝒌+𝟏
Action 𝒂𝒌
Reward 𝑹𝒌+𝟏
Environment
the behavior of the environment to the agent. Thus, the agent is capable to esti-
mate the state and the action for time T + 1 given the state and the action at time
T . At this level, supervised learning could be a powerful tool to do the prediction
work. Thus, dissimilar to the model-free RL, in model-based RL, the update of the
value function is based on the model and not on any experience.
In model-free RL problems, the agent cannot predict the future and this is
the main difference with the model-based RL framework explained previously. The
actions are rather based on the trials and errors, where the agent, for example,
can search over the policy space, calculate the different rewards, and decide finally
an optimal reward. A common classic example for model-free RL is the Q-learning
technique where it estimates the optimal Q-values of each action and reward and
picks the action having the highest Q-value for the current state. In short, dif-
ferentiating between model-based and model-free RL problems is a simple task.
Just ask yourself the following question : Is the agent able to predict the next
state and action, if the answer is yes then you are dealing with a model-based RL,
alternatively, it is more likely a model-free RL problem.
34
as an improvement to Q-learning which utilizes a discrete state and action space
in order to build the Q-table [13]. On the other hand, the Q-values of the DQN are
approximated using ANN by stocking all the previous agent experience in a dataset
and then feeding it to the ANN to generate the actions based on minimizing
a predefined loss function derived from the Bellman equation. It should be also
noted that the fact that the idea of DQN is inspired by Neural Fitted Q-learning
(NFQ), that was suffering from overestimation problems and instabilities in the
convergence [14]. There are many other improved variations of DQN such as double
DQN, dueling DQN, and distributional DQN. Regardless of the phenomenal success
of DQN, specifically when it was historically tested on ATTARI games, it has its
own limitations such as the fact that it cannot deal with continuous space action
and cannot utilize stochastic policies.
Deep Deterministic Policy Gradient (DDPG) : To overcome the limitation of
discrete actions, Deterministic Policy Gradient (DPG) algorithm was primary in-
troduced in Deepmind’s publication in 2014 based on an Actor-Critic off policy
method. For the sake of simplicity, lets say that Actor-Critic approaches are in
principle composed mainly of two parts : a Critic that estimates either the action-
value or the state-value and an Actor that updates the policy in the direction
proposed by the Critic. Later on, in 2015, and based on the DPG algorithm, a new
DRL algorithm called the Deep Deterministic Policy Gradient (DDPG) algorithm
was proposed. DDPG is a model-free, off-policy technique that is based on Actor-
Critic algorithm. In summary, DDPG is a DRL algorithm that aids the agent to find
an optimal strategy by maximizing the reward return signal. The major advantage
of this algorithm is that it functions well on high-dimensional/infinite continuous
action space.
3.3.2 . Q-Learning
Motivated by its popularity among RL algorithms, we introduce Q-learning
which is a classical free-model RL algorithm. Our intention in this section is to
provide a comprehensive and practical explanation on how RL can be used in UAV
path planning problems. We remain to a basic example where a drone is flying at
a fixed altitude and learn how to reach a given target while achieving its designed
objective.
35
3.3.4 . Update Rule
The update of the Q-table is done using a fundamental equation in RL which
is the Bellman equation :
where st , at are respectively the state and the action taken at time t, α is
the learning rate, which allows the old value of the Q-table to influence current
updates, γ is the discount factor, which is a measure of how future rewards will
affect the system. After every picked action, the agent updates its Q-table values
using (3.1), afterwards, at a given state, it selects the action having the highest
Q-value.
3.3.6 . Limitation of RL
Dissimilar to supervised learning, RL is the area of ML that does not require
the power of data to learn a new task. It rather uses the so-called “trial and error”
methodology based on an agent’s past experiences. In principle, this fact makes
RL an extremely robust tool, specifically for done-based problems such as finding
trajectory, resource management, and scheduling, where information is sometimes
incomprehensible. Moreover, RL can echo supervised learning in one single point
which is the objective of achieving full autonomy within a drone network by equip-
ping drones with the capability to autonomously make decisions in a real-time
36
Sending encrypted gradient
Server
Sending back model update
Updating models
Secure aggregation
37
ver. It is not particularly aimed for a drone network, but for any type of network
with central server (a base station in our scenario) and a number of clients (UAVs,
mobile users).
Here, we present a comprehensive explanation of FL algorithm for a scenario
in which a network of UAVs are served by a terrestrial base station. As a typical
objective, we suppose that the UAVs are processing different ground images. We
also assume that the optimization of the loss function is done through a simple
stochastic gradient descent (SGD) algorithm. As illustrated in Fig. 9, the central
server, which is the base station in our case, shares the current update of the global
model, denoted by wt , with a sub-set of the users. The subset size denoted by C, is
randomly selected by the server. when the client UAV receives the current update
of the global model, it utilizes its local training data to compute a local update of
the global model related to each UAV. Those parameters are the mini-batch size
denoted by B which indicates the amount of the local data used per each UAV,
the index k of the UAV, and the number of training passes each client makes over
its local dataset on each round, which is denoted by E. After updating process, the
UAV only communicates the updated data, denoted by wt+1 k , to the base station.
k
wt+1 = wt − η∇(wt , B) (3.2)
where η is the learning rate and l is the loss function. For instance, the UAV
performs a full batch update and hence uses all its local data since B = inf .
Then it repeats the (3.2) ten times since E = 10 and delivers the output wt+1 k
to the base station. Once the local update wt+1 k is received by the base station, it
improves the global model and then removes these updates because they are no
longer needed.
We have mentioned in the previous section that FL is a promising solution
for constrained networks where extensive calculation could not be done onboard.
It permits decoupling the model training and the access to the raw information
because of the fact that it is not mandatory for drones to share any data with the
server, instead, they only transmit their local update as explained already. Firstly, FL
decrease the privacy and security issues by minimizing data traffic over the network.
Consequently, it is considered an important approach for confidential systems where
data does not need to be shared. For instance, one can consider a recommender
system as an example of ML application where it is necessary that raw data will not
be shared between the clients. In many scenarios the clients do not wish for others
to know their preferences, thus FL preserves this privacy by keeping the local data of
each user private and only share the model updates. Secondly, FL is well suited for
applications where data is unbalanced. For instance, one client may be outside the
region of interest and thus have a small amount of data in comparison with other
clients. Let’s take the example of detecting a car by utilizing a drone’s camera,
therefore even if one of the drones is displaced in a given location where cars do
38
Source Task
Target Task
Transfer Learning
not often cross, that drone will efficiently detect a car when it is in the field of its
camera. This is due to the fact that other drones communicating with the server
have been involved in the training of the displaced UAV. Furthermore, the learning
process in the FL framework can be active even if one of the nodes is in the idle
state. For example, if one of the drones has to perform charging, an emergency
landing or encounters a connectivity failure, the learning process continues and
the drone can restore the updates when it reconnects to the network. Finally, FL
execute well on non-independent and identically distributed data, for instance, the
data partition realized by a single UAV cannot be representative of the overall
information of the system simply because the drone can only conceive a part of a
given process.
39
it can help UAVs learn similar strategies from a more reason- able initial network
based on model parameters previously trained [39]. As a result, the tracking task
is simplified into a set of simple sub-tasks. We can train the model to fulfill sub-
tasks, and migrate the sub-tasks model to the final task through parameter-based
transfer learning, which will be explained in detail in Section 3.
3.4 . Conclusion
In this chapter, encouraged by a wide set of new applications that can gain
assistance from drone networks, such as smart cities and aerial base stations deploy-
ment, we have covered in detail the new research directions when ML techniques
are utilized to increase the performance of UAV networks. We begun by preparing
an extensive overview of unsupervised and supervised ML techniques. Then we
introduce RL in details, that have been broadly applied in UAV networks. Then,
we discussed FL principles and advantages and where a FL approach can be used
in the field of UAV networks.
40
4 - Throughput Maximization with Learning
Based Trajectory for Mobile Users
41
In this chapter, we design a new UAV-assisted communication system relying
on the shortest flight path of the UAV while maximizing the amount of data
transmitted to mobile devices. In the considered system, we assume that UAV does
not have the knowledge of user’s location except their initial position. We propose
a framework which is based on the likelihood of mobile users presence in a grid
with respect to their probability distribution. Then, a deep reinforcement learning
technique is developed for finding the trajectory to maximize the throughput in
a specific coverage area. Numerical results are presented to highlight how our
technique strike a balance between the throughput achieved, trajectory, and the
complexity.
Sommaire
4.1 Introduction . . . . . . . . . . . . . . . . . . . 43
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . 51
42
4.1 . Introduction
43
consider a scenario that UAV is connected through the GPS system and has the
knowledge of user’s location in each time instant.
The rest of this chapter is organized as follows : the system model and achie-
vable system throughput are given in section 2. In Section 3, mobility model and
stochastic model for localization of users are proposed. In Section 4, the deep rein-
forcement learning algorithm is utilized for obtaining the UAVs’ dynamic movement
when users are roaming. Numerical results are carried out in Section 5. Finally, the
paper is concluded in Section 6.
Consider a system consisting of a single UAV and U ground users with dyna-
mic movement in the area and need to be covered. Let uu = [xu , yu ]T ∈ R2×1
represent the horizontal coordinate of u-th ground user where u ∈ U . The 2D
Cartesian coordinate of the UAV is presented as m = [xm , ym ]T . In practice, the
ground users receive three different kinds of signals from UAVs including LoS,
non-line-of-sight (NLoS), and multiple reflected signals. These signals occur with
specific probabilities in different environments and the probability of multiple re-
flected signal which results multi-path fading is considerably lower than two other
signals. Thus, their impact at the receiver side is typically ignored. Thus, we assume
that the communication link between ground users and the UAV is overshadowed
by the LoS signals. Based on this assumption, the channel power gain between
u-th user and the UAV is only a function of their Euclidean distance as below
hu,m = ρ0 d−2
u,m (4.1)
Hence, we have
ρ0
hu,m (t) = 2
(4.3)
zm + ∥uu − m∥
The bit rate at time t for u-th user can be formulated as below
where γu (t) is the signal-to-noise ratio (SNR) corresponding to the u-th user at
time t, which can be expressed as
P hu,m (t)
γu,m (t) = (4.5)
σ2
where P is the UAV transmit power and σ 2 is the power of the additive white
Gaussian noise (AWGN) at u-th user. Since users are mobile, for each user, there
44
are k possible locations with respect to time. So we have
Since the movement of users affect the system throughput, the UAV have to travel
based on the real-time movement of users to maximize the throughput for ground
users. Thus, to provide communication services for all ground users, we maximize
the achievable system throughput subject to the location of each user based on
their mobility model. So, we can write
U
Z T X !
max Ruk (t)dt (4.8)
xm (t),ym (t) t=0 u=1
where Huav and Vuav are the altitude and velocity of UAV, while Pc is the value for
transmit power from UAV to ground users. Furthermore, (9) and (10) denote that
initial position of each user is known by the UAV ; (11) indicates that the location
of mobile users are estimated based on their probability distribution, (12),(13) and
(14) set the constant values on altitude, transmit power and velocity of the UAV,
respectively.
The memoryless mobility models such as Random Walk allow mobile nodes
to move anywhere in the system with a stochastic random process for speed and
direction. Consequently, the mobility patterns are very disordered and may not
be able to reflect the real-time scenarios of mobile ad hoc networks. In reality,
movements of mobile nodes are restricted by obstacles. Moreover, there is some
correlation between the speed, direction, path, and destination of mobile nodes to
meet their corresponding objectives. Since our objective is to let the UAV learn the
trajectory based on the mobility of users, the choice of the mobility model has a
major impact on the learned trajectory. If we consider a model that users change
their direction or speed at each time step, the randomness in the environment is
too chaotic in which, there is no meaningful trajectory to be learned. Also, border
behavior of the environment and how users react when they reach the border
45
Fig. 4.1 – Probability distribution of a mobile user based on the grid
model.
In this section, we describe the novel technique for localization of mobile users.
In the considered scenario, we assume that the initial position of ground users are
known to UAV. In our algorithm, with regard to probability distribution found by
the grid model, the UAV makes the decision based on the most probable grids
which have the highest probabilities. Here, because of the large action size, we
limit the choices of UAV at each time instant to na = 4 for each user. Also, since
it is not necessary for the UAV to do the estimation at each time instant, we set
a time period Ta in which the UAV will estimate the locations periodically. The
localization algorithm is described in the following.
Given the location of mobile users, our goal is to obtain the optimal trajectory
of the UAV to maximize the system throughput. Reinforcement Learning (RL) has
a potential to deal with challenging and realistic models that include stochastic
movements of nodes. In general, RL is a learning approach that is used for finding
46
the optimal way of executing a task by letting an entity, named agent, take actions
that affect its state within the acting environment. The agent improves over time
by incorporating the rewards it had received for its appropriate performance in all
episodes [26]. In the Q-learning model, the UAV acts as agent, and the Q-learning
model consists of four parts : states, actions, rewards, and Q-value. The aim of
Q-learning is for attaining a policy that maximizes the observed rewards over the
interaction time of the agent.
1. State Representation : Each state in the set is described as : (xu , yu ), where
(xu , yu ) is the horizontal position of UAV. As the UAV takes a trajectory
in a specific episode, the state space can be defined as xu : 0, 1, ...Xd
, yu : 0, 1, ...Yd , where Xd and Yd are the maximum coordinate of this
particular episode.
2. Action Space : The action space A is described by all possible movement
directions, the action of remaining in the same place and 4 possible lo-
cations for each of the mobile users. By assuming that the UAV fly with
simple coordinate turns, the actions related to movement of UAV is simpli-
fied to 7 directions. Combining the actions from the dynamic movement of
UAV and estimation based on the grid model, the action size will be equal
to 263.
3. State Transition Model : Considering a deterministic MDP, there is no
randomness in the transitions that follow the agent’s decisions. Thus, the
next state is only affected by the action that the agent takes.
4. Rewards : The reward function is defined by the instantaneous throughput
of users. If the action that the agent carries out at current time t can im-
prove the throughput, then the agent receives a positive reward, otherwise,
the agent receives a negative reward.
Due to the size of MDP, we create an RL agent as a feed-forward neural
network (NN), with F input neurons, Y hidden states each with the same number
of neurons Z, all using rectified linear (ReLU). When receiving the current state,
described with F features as input, the NN agent outputs its evaluation for all seven
actions that can be taken. However, the use of NNs in RL tasks may fail to converge
especially in problems with stochastic environments, such as ours. Therefore, we
rely on deep RL and using double Q-learning to solve our problem [27].
For the double-Q-learning RL algorithm, we need to keep two separate
agents with the same properties but with different weight values wP and wT .
As such they will output a different Q-action function when given the same
state. One is used to choose the actions, called a primary model QP (st , at ),
while the other model evaluates the action during the training, called a target
model QT (st , at ). Therefore training occurs when taking a batch of expe-
riences et from the buffer that is used to update the model as :
Qnew
P = (1 − α)Qp + α [rt + (1 − dt )γ max QT (st+1 , a)] (4.15)
47
Hyperparameter value
optimizer for SGD Adam
learning rate for opti-
0.0001
mizer
discount factor γ 0.99
number of hidden
2
layers
number of neurons 256
minibatch size 32
action space size 263
activation function ReLU
replay buffer capacity 106
where max QT (st+1 , a) is the action chosen as per the agent, α is the learning
rate which was an input to the Adam optimizer [28], and γ is a discount factor
that reduces the impact of long term rewards. We implement this with soft
updates where instead of waiting several episodes to replace the target model
with the primary. The target model receives continuous updates discounted
by value τ as in wT = wT (1 − τ ) + wP τ .
Now, we examine how the agent makes the decision from the large action
space at each time step and how invalid action masking and normalized
probability distribution are realized to strict the agent for repeatedly taking
invalid actions. It has been shown that invalid action masking scales better
when the space of invalid actions is large and the agent solves the desired task
while invalid action penalty struggles to explore even the very first reward.
First, let us see how a normalization carry out in to the discrete action
space for when UAV has to decide the location of users after each tc se-
conds. For illustration purposes, consider the 4 probabilities in Fig.6.1 which
correspond to highest possible locations for one user at time t. Thus, let us
acknowledge an MDP with the action set A = a0 , a1 , a2 , a3 and S = s, s′
where the MDP reaches the state s′ after an action is taken in the initial
state s. Thus we have
P (s′ |s, a) = [p(a0 |s0 ), p(a1 |s0 ), p(a2 |s0 ), p(a3 |s0 )]
(4.16)
= [0.094, 0.3, 0.104, 0.22]
Now, after normalization enforced, we can write
P (s′ |s, a) = [0.13, 0.41, 0.14, 0.3] (4.17)
Now for states that UAV actions are about the coordinates of UAV and
come from the possible directions described in section ??, we have to mask
48
4.5
4.0
3.0
2.5
2.0
1.5 GPS
Stochastic
0 2000 4000 6000 8000 10000 12000
Episode
49
175
150
125
100
y [m]
75
50
25
0
0 20 40 60 80 100
x [m]
Fig. 4.3 – Trajectory obtained by UAV for the case that four ground users
are roaming.
50
4.5 . Conclusion
In this chapter, the DRL technique has been utilized to optimize the
flight trajectory and throughput performance of UAV-assisted networks. The
mobility of users is considered in to the system model and a novel approach
for estimating the location of mobile users has been studied. A learning-based
algorithm was proposed for solving the problem of maximizing the system
throughput by utilising a prior knowledge of likelihood of presence in a grid.
We designed a DRL based movement algorithm for obtaining the trajectory
of UAV. It is demonstrated that the proposed approach performs well in
comparison despite the fact of being simple to implement.
51
52
5 - Federated Reinforcement Learning UAV
Trajectory Design for Fast Localization of
Ground Users
53
In this chapter, we study the localization of ground users by utilizing
unmanned aerial vehicles (UAVs) as aerial anchors. Specifically, we intro-
duce a novel localization framework based on Federated Learning (FL) and
Reinforcement Learning (RL). In contrast to the existing literature, our sce-
nario includes multiple UAVs learning the trajectory in different environment
settings which results in faster convergence of RL model for minimum lo-
calization error. Furthermore, to evaluate the learned trajectory from the
aggregated model, we test the trained RL agent in a fourth environment
which shows the improvement over the localization error and convergence
speed. Simulation results show that our proposed framework outperforms a
model trained with transfer learning by %30.
Sommaire
5.1 Introduction . . . . . . . . . . . . . . . . . . . 55
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . 62
54
5.1 . Introduction
55
dually been a vast live database abounding with real-time information, which
can be utilized by ML to optimize network operations and organization. It
has become an important issue to appropriately and effectively use ML tech-
niques based on data distributed over a massive mobile network. Specifically,
when transporting raw data from all UAVs to a server in a huge network
due to the many issues, such as network congestion, energy consumption,
privacy, security, etc. To avoid transporting a huge amount of distributed
data to a server for conducting centralized ML and to preserve the privacy of
users, a distributed learning methodology without raw data transportation,
such as federated learning (FL) [33], becomes a viable solution.
In this paper, we introduce a novel framework for ground users (GUs)
localization in urban environments using UAVs. Our proposed framework in-
corporate reinforcement learning with federated learning which enables us to
explore the optimal trajectory of the UAVs for maximum localization accu-
racy for different types of propagation environments. First, by formulating
the problem we investigate the paths that UAVs take for for minimum loca-
lization error for three environments with different parameters which impact
the path loss and accuracy of localization. By utilizing federated learning
technique we aggregate these models and finally we test the trained model
in fourth environment. Our results show that the localization error achieved
with same number of training episodes is %30 lower with trained FL model
from three environment as compared to the model transferred sequentially
from first environment to fourth environment.
The rest of this chapter is organized as follows. In Section II, we introduce
the system model and the path loss model for localization based on RSS.
Then, the machine learning framework for UAVs is introduced in Section
III. In Section IV the simulation results are presented. Finally, the work is
concluded in Section V.
In this paper, we assume multiple UAVs flying over an urban area at a fixed
altitude h, operating as an aerial anchors to localize multiple terrestrial users.
These devices are equipped with a wireless communication device which
periodically broadcast a probe request. We resort to utilizing the following
log-normal shadowing pathloss model as it is capable of modeling wireless
environments with acceptable precision [32]. We formulate the path loss as :
4πf
L = 20 log(d) + 20 log( ) + Aτ (θ) (5.1)
c
where d is the distance between the UAV and ground user, f and c are
respectively the system frequency and speed of light, and Aτ (θ) is a log-
56
Global FL Model
Local Model
Trained with RL
Local Model
Trained with RL
Local Model
Trained with RL
1
Suburban
3
High Urban
2
Urban
normal distributed random variable with mean µτ and variance στ2 (θ), i.e.,
where στ (θ) corresponds to the shadowing effect of LoS and NLoS links
between the UAV and the ground user, where τ = {0, 1} is an indicator that
can have value 1 for LoS link and 0 for NLoS link. Thus we have :
180
στ (θ) = dτ exp(−cτ θ
) (5.4)
π
and PLoS (θ) is the probability of having LoS link, which is written as :
1
PLoS (θ) = (5.5)
1 + a exp(−b(θ 180
π
− a))
d = 10ζ (5.6)
57
least squares are used to estimate the position of the user (x̂, ŷ) according
to the estimated distances. In a two-dimensional space, ni distance mea-
surements from ni dissimilar positions are calculated to generate ni circles
centered at the position where the measurements are taken with radii equal
to the respective measurements. If the distance measurements are accurate,
the ni circles intersect in one point that establish the position of the user.
Now, given (xi , yi ) the ground position of the UAV at sample point i, and
rˆi be the distance from sample point i to the middle of overlapping circles,
then we can estimate the location (x̂, ŷ) using N number of samples from
the following minimization formula :
N
X p
(x̂, ŷ) = min (xui − x̂)2 + (yui − ŷ) − ri (5.8)
x̂,ŷ
i=1
58
Algorithme 1 : Federated averaging with DDQN.
1 : Execution on Server :
2 : Initialize w0
3 : for j = 1 to Maxrounds do
4 : M = set of UAVs
5 : for Each UAV in parallel do
k
6: wt+1 = ClientU pdate(k, wt )
7 : end forP
1
8 : wt+1 = wk
M t+1
9 : end for
10 : Return wt+1 to UAVs.
11 :
12 : Execution on UAV :
13 : Construct reward function R
14 : Init : UAV position, s, Qi i ∈ [A, B]
15 : Repeat
16 : if Localrounds < max (localrounds )
17 : Choose action :
18 : a = argmaxa Qi (s, a) from Qi (i ∈ [A, B])
19 : Receive immediate reward
20 : Update table Qi
In the UAV network proposed in Section II, our aim is to investigate the
performance of FL over the UAV network that localize ground users via RSS
reading, which lead to continuous FL between the edge server and the UAVs.
Thus, we propose a FL model over the network in Fig. 1 as follows. Suppose
there are 3 UAVs distributed in the network and their task is to jointly learn
a global model with the edge server in T training rounds. To characterize the
impact of different environment parameters on localization error, we assume
each UAV is operating in a different environment setting i.e from sub urban
to high urban.
59
a b µ1 µ0 d1 d0 c1 c0
env1 4.88 0.43 0.1 21.0 11.25 32.17 0.06 0.03
env2 9.61 0.16 1.0 20.0 10.39 29.6 0.05 0.03
env3 12.08 0.11 1.6 23.0 8.96 35.97 0.04 0.04
env4 14.32 0.08 2.3 34 7.37 37.08 0.03 0.03
Table 5.1 – The path loss parameters for : Suburban (1), Urban (2), Dense
Urban (3) and Highrise Urban (4) environments [11].
DDQN(FL)
300 DDQN
250
Localization Error [m]
200
150
100
50
0
0 250 500 750 1000 1250 1500 1750 2000
Number of training episodes
Fig. 5.2 – Localization error versus training episodes in env1 .
Comparison between FL model and baseline DDQN.
FedAvg orchestrates training with a central server which hosts the shared
global model wt , where t is the communication round. The algorithm ini-
tialize by randomly setting the global model w0 . One communication round
of FedAvg can be described in the following : At the beginning, the server
distributes the current global model wt to all UAVs. After updating their
local models wtk t to the shared model,wtk ← wt , each UAV partitions its
local data into batches and performs epochs of Stochastic Gradient Decent
k
(SGD). Finally, UAVs upload their trained local models wt+1 to the server,
which then generates the new global model wt+1 by computing a weighted
sum of all received local models. Our approach for utilizing FedAvg reinfor-
cement learning for localization is represented in Algorithm 1.
60
150 DDQN(TL) 140 DDQN(TL) 140 DDQN(TL)
DDQN(FL)
140 130
120
130 120
Localization Error [m]
110 100
80
100 90
90 80
60
80 70
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
Number of training episodes Number of training episodes Number of training episodes
61
ment 4 based on the pre-trained model from previous environments and also
Ne = 500 episodes is training with FL pre-trained model from environment
1 − 3, Fig. 6.6 (c). As we can see the localization error achieved with 500
episodes of training in the 4th environment with the pre-trained model from
transfer learning is approximately equal to 70m, while with 500 episode, the
FL pre-trained model reaches the localization error of 50m. This result shows
that our proposed framework is efficient in reducing convergence speed by
%30 and achieving better generalization performance in comparison with
transfer learning approach.
5.6 . Conclusion
62
6 - Multi-Objective Trajectory Design for UAV-
Assisted Dual-Functional Radar-Communication
Network : A Reinforcement Learning Ap-
proach
63
In this chapter, we explore the optimal trajectory for maximizing commu-
nication throughput and minimizing localization error in a Dual-Functional
Radar Communication (DFRC) in unmanned aerial vehicle (UAV) network
where a single UAV serves a group of communication users and locate the
ground targets simultaneously. To balance the communication and localiza-
tion performance, we formulate a multi-objective optimization problem to
jointly optimize two objectives : maximization of number of transmitted bits
sent to users and minimization of localization error for ground targets over
a particular mission period which is restricted by UAV’s energy consumption
or flying time. These two objectives are in conflict with each other partly and
weight parameters are given to describe associated importance. Hence, in
this context, we propose a novel framework based on reinforcement learning
(RL) to enable the UAV to autonomously find its trajectory that results in
improving the localization accuracy and maximizing the number of trans-
mitted bits in shortest time with respect to UAV’s energy consumption. We
demonstrate that the proposed method improves the average transmitted
bits significantly, as well as the localization error of the network.
Sommaire
6.1 Introduction . . . . . . . . . . . . . . . . . . . 65
6.5 Preliminaries . . . . . . . . . . . . . . . . . . . 76
6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . 84
64
6.1 . Introduction
65
In [46], a closed-form solution for optimizing the coefficients in the analog
antenna arrays to generate a multibeam for joint communication and radio
sensing was introduced. Moreover, the authors in [47] proposed a novel tech-
nique for embedding communication information into MIMO radar waveform
via sparse antenna array. In [48], the authors investigated the power mini-
mization issue in DFRC system via joint subcarrier assignment and power
allocation.
Although the advantages of alternative localization techniques such as,
AOA (angle of arrival), TOA (time of arrival), or TDOA (time difference of
arrival) have been demonstrated in enhancing the performance of wireless
networks, the radio received signal strength (RSS) is more attractive due
to its simplicity and cheap functionality (does not require extra antennas or
time synchronization) [49]. Despite having low complexity, its localization
accuracy is fairly affected by the randomness of the received signal and sha-
dowing, notably in urban areas. However, a UAV may be used to localize
ground targets as an enhancement. The UAV has the capacity to measure
the RSS of multiple targets from different positions with higher probability of
line-of-sight (LoS), and thus better localization accuracy [11]. Furthermore,
besides accurate positioning, timely localization is also crucial for many ope-
rations like in search and rescue missions. For instance, finding locations of
trapped people after a disaster or a patient who needs rescue in a serious life
threat [50]. Consequently, finding the correct flight path (trajectory) is essen-
tial for both timely and accuracy of the targets’ localization. Additionally, a
UAV has limited energy which reduce its operational lifetime. Thus, different
factors such as UAV’s velocity, hovering time, and path length affect the
energy consumption of the UAV, and as a result impact the localization ac-
curacy due to fewer collected RSSI measurements. Another challenge is that
the UAV, before its mission, does not know the number and locations of the
objects, therefore, none of the existing pre-path planing algorithms from the
literature are efficient for the fast localization operation. To this end, the
necessity in creating an autonomous UAV so as to observe the environment
while localizing becomes crucial [51].
In the literature, there are many works that studied the localization pro-
blem. In [49], the authors investigated the main factors that impact the
accuracy of the RSS measurements and proposed and approach to mitigate
the negative impacts of these factors. In [52], the authors introduced a dis-
tributed based localization technique to attain high accuracy without dense
deployment. In [53], new schemes (cooperative and noncooperative) based
on convex optimization are designed to enhance the localization accuracy.
In [54], the authors analyzed the accuracy achieved through changing the
height and distance of the anchors to terrestrial targets.
Furthermore, [55] proposed three different pre-determined trajectories for
66
a mobile anchor to travel the whole area, and demonstrated that any deter-
ministic trajectory display significant benefits compared to a random move-
ment. In [55], the authors proposed a location verification using a random
anchor movement. In [56], a novel trajectory is proposed, where in this ap-
proach, all deployed nodes are localized with high precision and short required
time. In [57], the authors introduced a trajectory named LMAT. The authors
in [58] presented a novel localization algorithm, where in their technique,
one mobile anchor combine least square method to estimate the location of
terrestrial nodes. In [59], multiple location-aware mobile anchors localize the
unknown nodes. To implement this, the authors introduced two algorithms
in which one is to control the trajectory of the mobile anchor, and another
is to extract the direction and distance of unknown nodes.
Moreover, localizing ground targets by utilizing UAV is studied thoroughly
in the literature. In [60], the authors studied the advantages of using drone
anchor. In [61], a multiple path planing algorithm based on traveling sales-
man problem is proposed for a UAV to localize all targets positions. Also,
in [62] a technique using triangulation that guarantees the localization preci-
sion is introduced.In [63], the authors improved the localization accuracy by
equipping a UAV with directional antennas. [64] extended the approach even
further by using omnidirectional antenna. In [31], the authors proposed a fra-
mework using RL to let a UAV traverse a trajectory that results in finding the
position of multiple ground targets with minimum average localization error
under fixed amount of UAV energy consumption, trajectory length, number
of waypoints, or flying time. In [65], the authors proposed a method to loca-
lize users in disaster scenes having regions with varying importance that may
be set according to the damage and population level. In [66], the authors
studied 3-D localization via autonomous UAV that works independently of
the GPS or other detectable mobile signals transmitted by the UAV. For this
purpose, they utilized the existing cellular infrastructure to enable the UAV
to determine its location using the locations of four surrounding base sta-
tions of the cellular network. In [67], a novel localization and path planning
approach based on UAVs is proposed in which the UAVs can extract one-hop
neighbor information from the devices that may have run out of power by
using directed wireless power transfer.
To the best of our knowledge, no work has considered using a smart
UAV to autonomously observe the environment and find the trajectory that
results in faster multipleobject localization with minimum errors, by only re-
lying on RSS information, and taking into account the variation of shadowing
with UAV elevation angle in urban areas. By leveraging the advantages of
DFRC systems, the performance of communication and localization can be
improved with reduced power consumption. However, a number of important
issues need to be addressed, such as the path planing and speed of the UAV.
67
• Echo Signal
• Transmit Signal
• Mobile device
• Central station
• UAV
• Target
In this paper, we study a UAV enabled DFRC system, where a single UAV is
employed to simultaneously serve a group of communication users and co-
operatively localize the targets in the area. We introduce a framework using
reinforcement learning (RL) to optimize the operation of the UAV in urban
areas. Based on the UAV limitations, such as UAV energy, operational time,
UAV speed, a Markov decision process (MDP) model is formulated. Then,
the introduced RL algorithm (known as double-Q-learning algorithm) allow
the UAV the necessary artificial intelligence to autonomously find the path to
optimize the communication system throughput and achieve a localization
precision with considered capacity factor. The novelty of our work focused
on the fact that a smart UAV autonomously discover the environment and
identify the path that will result to providing the maximum communication
service in terms of average throughput and the fastest multi-object localiza-
tion with desired error, by just counting on RSS information, and considering
the variation of shadowing with UAV elevation angle in urban areas.
The rest of this chapter is organized as follows. In Section II, we introduce
the system model, the path loss model for localization based on RSS and the
power consumption model for rotary UAV. Then in section III, we describe
the multi objective optimization problem. The machine learning framework
for UAVs is introduced in Section IV. In Section V the simulation results are
presented. Finally, the work is concluded in Section VI
68
considerations. The x − y location of the UAV is denoted by xu , yu . The
location of k-th ground user can be given by xk , yk . We resort to utilising the
following log-normal shadowing pathloss model as it is capable of modeling
wireless environments with acceptable precision. We formulate the path loss
as [32] :
4πf
L = 20 log(d) + 20 log( ) + A(θ) (6.1)
c
where d is the distance between the receiver and transmitter, f and c are
respectively the system frequency and speed of light, and A(θ) is a log-normal
distributed random variable with mean µ and variance σ 2 (θ), i.e.,
where σLoS (θ) and σN LoS (θ) correspond respectively to the shadowing effect
of LoS and NLoS links between the UAV and object, and they are given by :
where a0 ,b0 ,aLoS ,bLoS ,aN LoS and bN LoS are environment dependent parame-
ters. Thus, the distance between the UAV and the device can be estimated
as follows :
As described in [68], many localization techniques can be used in wire-
less networks like trilateration, multilateration, triangulation and others. The
aforementioned techniques are based on GPS, RSSI, AOA (angle of arrival),
TOA (time of arrival), or TDOA (time difference of arrival) measurements
to perform localization of devices with unknown positions. RSSI-based tech-
niques have been shown to provide an effective trade-off between accuracy,
feasibility and complexity and, thus, are suitable for our proposed solution
approach. Once an RSSI reading is captured, it needs to be converted to dis-
tance using an appropriate channel model. Thus, by considering the pathloss
model from eq.6.1, we can write :
69
where Pref and Pt denote the reflected signal and the transmitted signal po-
wer, respectively. The received signal at the UAV comming from the reflection
at the target can be defined as :
Rk = log2 (1 + γk ) (6.10)
70
300 Total power
Blade profile power
Induced power
250
Parasite power
200
Power [w]
150
100
50
0
0 5 10 15 20 25 30
Speed [m/s]
71
Parasite Power is the power used to overcome the drag force resulted
from moving through the air.
1
Pparasite = ρv 3 F (6.14)
2
ρ is the air density, and F represents a constant that depends on the UAV
drag coefficient and reference area. Note that this power is proportional to
the UAV velocity v ; it is zero when hovering and gradually increases by the
speed of the UAV.
This power is required to lift the UAV and overcome the drag caused by
the gravity. Whenever a UAV is moving, the airflow coming at it redirects
the UAV and helps to lift it. Hence, the induced power has inverse proportion
to the airspeed. When hovering, all the airflow needed to lift the UAV has
to be created by the blade rotors, which results in more power consumption.
The induced power can be written as follows :
where m and g respectively denote the mass of the UAV and the standard
gravity, whereas, vi represents the mean propellers’ induced velocity in the
forward flight, and it is given by :
s
−v 2 + v 4 + ( mg
p
A
)2
vi = (6.16)
2
with A being the area of the UAV. In the case of hovering, (i.e., when v = 0),
the total power consumption is limited to hovering power and is calculated
accordingly : s
(mg)3
Ptotal = Phover = K + (6.17)
2ρA
In Fig. 6.2, we show the trend of the three power consumption factors as
well as the total power versus the UAV speed. As it is shown in the figure,
we can conclude that at optimal speed (10[m/s]), the UAV consumes less
power compared to hovering time. Thus, in order to minimize the localization
error with the knowledge of limited UAV battery, it is not always desirable
to increase the number of RSS samples.
72
in to account the constraint on UAV energy consumption. The UAV is re-
quired for perception of the urban environment and to implement real-time
path planning. The decision of UAV flight trajectory and the choose of ho-
vering position should consider the quality of communication, the precision
of localization for ground users and energy consumption of UAV. As for the
number of transmitted bits, its maximization depends on the amount of data
that are sent over the UAV mission period. It can be easily concluded that to
maximize Rsum, on one hand, the UAV should fly at a lower speed so that
it can have a higher flight time, which means more transmitted bits. On the
other hand, the hovering location should be close to the target users so as
to improve the data rate. From this aspect, hovering over the intersection
of all users is the best choice. As for the minimization of localization error,
besides the maximization of RSS samples, we hope that more samples are
taken from different positions on the UAV. It may conflict with the UAV’s
hovering directly over the intersection point of ground users to get the maxi-
mum data rate. As for the constraint of UAV’s energy consumption and flight
time, it is clear that slower speed can achieve minimum energy consumption
and higher flight time. However, it may be not fast enough to collect more
RSS samples and reduce the localization error.
It is evident that these two objectives are in conflict with each other par-
tially. Due to the random distribution of devices and their dynamic numbers,
it is considerably complicated and may impose significant computational cost
to identify an optimal trajectory and hovering location decision. Moreover,
the environment is partly observed, traditional model-based methods like dy-
namic programming method are incapable to solve this problem. Recently,
DRL has accomplished an excellent ability to solve complex problems and is
considered as one of the core technologies of machine learning. With inte-
gration of deep learning and RL, it owns the strong understanding ability and
decision-making ability and thus can realize end-to-end learning. It has shown
great potential in solving sophisticated network optimizations. DDQN, which
is one of the DRL algorithms, has been proved that can learn effective polices
in problems with complex optimal policy in great state space. It is suitable
for our proposed UAV’s flight decision problem where UAV is operating in a
stochastic environment. Since the reward of original DDQN algorithm is sca-
lar, we extend it to weighted sum reward for the multi objective optimization
problem. The problem can be formulated as :
PK
rCl * max W1 E k=1 Rn [k] + W2 M SE(xˆk , yˆk ), (xk , yk ) ∀k
x,y,v
s.t.Etotal [n] ≤ λBu
lmin ≤ x[n] ≤ lmax , ∀n
lmin ≤ y[n] ≤ lmax , ∀n
vmin ≤ v[n] ≤ vmax , ∀n
z[n] = Hu , ∀n
73
Pt [n] = Pu , ∀n
In summary, we aim to find a control policy that can 1) maximize the
system throughput ; 2) minimize the localization error, and 3) ensure that
the energy consumption of UAV does not exceed the battery capacity and the
UAV is capable to return safely to recharging station. It is quite challenging
to achieve all of these objectives because on one hand, to provide effective
communication, it is preferred for the UAV to hover at a optimal position,
one the other hand to minimize the localization error, it is preferred for the
UAV to move around to different locations ; and to minimize the energy
consumption, it is preferred to reduce UAV movements (for energy savings).
Hence, a good solution to this problem is supposed to well address this
trade-off. Furthermore, (6.18b) ensures the UAV energy consumption to not
exceed λ percentage of UAV on-board battery, (6.18c), (6.18d) and (6.18e)
indicates the boundary of horizontal movements and speed of UAV in the
environment, respectively. Also, (6.18f) and (6.18g) set the constraints for
UAV’s altitude and transmit power, respectively.
In this section, we clarify how the UAV estimates the location of ground
users with received RSSI, and utilizing multilateration repeatedly to mini-
mizes the average position errors. To be more specific, we will describe how
to calculate e[n] from reward function described in (6.25) and estimate future
Q-value function Q(st+1 , at+1 ) for RL agent. Here, we depicts the localiza-
tion process for single user and then it can as well be applied to other users.
Finally, the average localization errors from all users will be the measured
metric for the RL reward and Q-value at each state. In Fig. 6.3, we show the
localization error reduction of a user by utilizing multilateration technique.
The user to be located is highlighted by red dot, the UAV path by blue da-
shed lines, and user estimated location area by shaded green color. In the
initial stage, by receiving RSSI measurement at on time stamp, following the
channel model and (6.9) in section ??, the location of of the user is estimated
in the shaded green zone between the inner (I1 ) and outer (O1 ) circles. The
radius of these circles is dependent on the shadowing parameter and path
loss exponent. In the next stage, when the UAV traverse to the next posi-
tion and measure another RSSI measurement, the localization zone shrinks.
Whenever the number of measurements becomes three, the position of the
user can be estimated using trilateration, and consequently, the calculation
of the localization error. As the number of samples and RSSI measurements
increase, the localization error correspondingly reduces.
In Fig. 6.3, we illustrate how the error for one user using three samples
can be calculated. The intersection point between three lines and connect
74
• UAV
• Localization zone
• UAV path
• Ground user
Fig. 6.3 – Tilateration for the case of one node in which shadowing com-
ponent is bounded between two values.
inner and outer circles presents the estimated position of the user. Thus, the
localization error can be obtained by finding the farthest border point to the
estimated point as shown in the black line in the figure. Here, we consider
the Cartesian coordinate for the estimated location of the user is (x̂, ŷ). Let
(xs i , ys i ) be the known ground position of the UAV at sample point i, and
r¯i = Oi2+Ii be the distance from sample point i to the middle of the two
circles, then the estimated position (x̂, ŷ) using M number of samples can
be calculated from the following optimization model :
M
X
p
(x̂, ŷ) = argmin{ 2 2 2
(xs i − x̂) + (ys i − ŷ) − r¯i } (6.18)
x̂,ŷ i=1
The border points of the estimated zone of the user are generated each
by the intersection of two RSSI circles. Fig. 6.4 shows how a border point
is found. As shown in the figure, r1 and r2 are respectively the radius of
sample points s1 and s2 , and k is the distance between the two sample
points. P1 and P2 are the required intersection points between two circles,
and P0 is the intersection point of the perpendicular line connecting P1 and
P2 with line k. Respectively, q1 and q2 denote the distances from s1 to
P0 , and from P0 to w2 , respectively. Now, if we let (xs 1 , ys 1 ), (xs 2 , ys 2 ),
(xP 0 , yP 0 ),(xP 1 , yP 1 ), and (xP 2 , yP 2 ) define respectively the Cartesian coor-
dinates for points w1 ,w2 ,P0 ,P1 , and P2 , then the border points are calculated
through the following equations :
(ys 2 − ys 1 )
xP1 ,P2 = xP 0 ± (6.19)
k
(xs 2 − xs 1 )
yP1 ,P2 = yP 0 ∓ (6.20)
k
75
𝑷𝟏
𝒓𝟏 𝒓𝟐
𝒉
𝑺𝟏 𝑺𝟐
𝒒𝟏 𝑷𝟎 𝒒𝟐
𝑷𝟐
r2 −r2 +k2
where (xP 0 , yP 0 ) = (xs1 + (xs2 −x
k
s1 )q1
), (ys1 + (ys2 −yk s1 )q1 ) , q1 = 1 2k2
p
and h = r12 − q12 . After a new RSSI sample received by the UAV, the ac-
curacy of the estimated user localization zone is updated by first removing
the the previous border points and then add new intersection points (des-
cribed above) and finally find distances from all obtained zone points to the
estimated user point, and the one with farthest distance is the user’s loca-
lization error. After obtaining the localization error for all ground users in
the current state st , we average over all error values. Then, we evaluate the
reward function corresponding to localization in eq. (6.25) by dividing the
localization error calculated in the current state by emin which is the set to
arbitrary value for minimum possible localization error, i.e 10[m]. Similarly,
we estimate the future average localization errors for all available neighbor
sample points and actions, and we update the approximated Q-value func-
tion for all actions and store them into the table. Subsequently, for the next
iteration, we choose the action that results in higher reward by looking at
the stored Q-value functions.
6.5 . Preliminaries
76
which is suitable for controlling an autonomous machine such as UAV.
RL is a learning approach that is used for finding the optimal way of
executing a task by letting an entity, named agent, take actions that affect
its state within the acting environment. In RL, the environment is typically
formulated as an MDP, which is described by four tuples (S,A,R,P ), set of
possible state S, set of available actions A, and reward function R : S×A and
transition probability P (ŝ|s, a) → [0, 1]. The agent interacts with an unk-
nown environment through the repeated observations, actions, and rewards
to construct the optimal strategy. When interacting with the environment,
after choosing an action at ∈ A, the agent receives a reward r(st , at ) and
moves to the next state st+1 . The goal of RL is to learn from the transi-
tion tuple , and find an optimal policy π ∗ that will maximize the cumulative
sum of all future rewards. Note that the policy π = (a1 , a1 , ..., aT ) defines
which action at should be applied at state st . If we let r (st , π(at )) denote
the reward obtained by choosing policy π, the cumulative discount sum of
all future rewards using policy π is given by :
X
Rπ = γ t−1 r(st , π(at )) (6.21)
π ∗ = argmaxRπ (6.22)
π∈Λ
Qnew
P = (1 − α)Qp + α [rt + (1 − dt )γ max QT (st+1 , a)] (6.23)
77
UAV’s altitude (h) 100[m]
Rotor solidity (s) 0.05
Profile drag coefficient (δ) 0.012
UAV weight (N ) 20[N ]
Air density (ρ) 1.225[kg/m3 ]
Rotor disc area (A) 0.503[m2 ]
Rotor blade tip speed (Utip ) 120[m/s]
Induced power correction fac-
0.1
tor (k)
Environment constant for PLoS
45
( a0 )
Environment constant for PLoS
10
( b0 )
Shadowing constant for PLoS
10
(aLoS )
Shadowing constant for PLoS
2
(bLoS )
Shadowing constant for PN LoS
30
(aN LoS )
Shadowing constant for PN LoS
1.7
(bN LoS )
where max QT (st+1 , a) is the action chosen as per the agent, α is the learning
rate which was an input to the Adam optimizer [28], and γ is a discount factor
that reduces the impact of long term rewards. We implement this with soft
updates where instead of waiting several episodes to replace the target model
with the primary. The target model receives continuous updates discounted
by value τ as in wT = wT (1 − τ ) + wP τ .
78
Table 6.2 – Network Configuration
Parameters Values
Number of training
16000
episodes
Learning rate 0.0001
Discount factor 0.99
Replay memory size 8000
Batch size 32
Number of neurons 256
SGD optimizer Adam
Activation function ReLU
information. Specifically, UAV can record its own location, the estimated dis-
tances and localization error of ground targets, communication rate of users
and UAV energy consumption. Thus the state space is defined symbolically
as :
79
4.0
3.5
3.0
Accumulated reward
2.5
2.0
1.5
1.0
0.5
0.0
0 2000 4000 6000 8000 10000 12000 14000 16000
Number of training episodes
80
160 UAV Speed = 10 [m/s]
400
UAV Speed = 20 [m/s]
140 UAV Speed = 30 [m/s] 400
Number of transmitted bits
60 250
100
40
UAV Speed = 10 [m/s] 200 UAV Speed = 10 [m/s]
UAV Speed = 20 [m/s] 20 UAV Speed = 20 [m/s]
0 UAV Speed = 30 [m/s] UAV Speed = 30 [m/s]
150
0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 12000 14000 16000
Episodes Episodes Episodes
20.0 360
400 22.5
340
25.0
Number of transmitted bits
380
Localization error [m]
27.5 320
Flight time [s]
360
30.0
300
340 32.5
35.0 280
320
37.5 260
300 40.0
10 15 20 25 30 35 40 10 15 20 25 30 35 40
UAV Speed [m/s] UAV Speed [m/s]
(a) (b)
Fig. 6.7 – Impact of UAV speed on (a) Number of transmitted bits and
localization error ; (b) UAV flight time.
transmitted bits.
We start by illustrating the effectiveness and convergence of the proposed
DDQN algorithm. The learning curve of the trained DDQN agent is shown in
Fig. 6.5. The figure plots the accumulated reward versus number of training
episodes. Here, the weight parameters are set to WR = WL = 1.0. We
consider the jointly optimization of two objectives. It can be seen in Fig.
6.9 that the agent quickly learns to obtain higher expected total rewards as
training progresses. And then the accumulated reward converges steadily at a
high level. At first about 10000 episodes, the accumulated reward fluctuates
at a very low level. It is because that the UAV is in complete experience stage.
Without enough experience to learn from, the action is chosen randomly.
At the same time, the loss of the network is 0 and the objectives are not
optimized. When the replay memory is full, the UAV begins to sample the
stored experience tuples to train networks. We can see that there is a major
exploration and learning stage before about the 10000th episode.
81
The changing trend of two objectives as well as flight duration of corres-
ponding results during the training are also illustrated in Fig. 6.6a, Fig. 6.6b
and Fig. 6.6c. We start by examining the results obtained by training the
RL agent and compare different UAV speeds on localization and communi-
cation performance. Fig. 6.6a depicts the number of transmitted bits achieve
during multiple episodes for training RL agent. As it is shown in the figure,
after 8000 episodes the RL agent reaches convergence. As can be seen in
the figure, the UAV operating at speed 30[m/s] can transmit around 330
bits, while moving at speed 20[m/s] it can transmit approximately 360 bits
and when traversing with 10[m/s], the UAV can transmit 400 bits during its
mission duration. Fig. 6.6b illustrates the localization error obtained through
training episodes. We can observe that after 8000 episodes, the RL agent
reaches convergence for minimizing localization error. As the figure shows,
when UAV is moving at speed 30[m/s], it can achieve the localization error
of 28[m] while moving at speed [m/s], it can reach the localization error of
34 and when the UAV is operating with 10[m/s], it achieves 40[m] error for
localization. In the Fig. 6.6c, we show the flight time of UAV during training.
Similar to previous figures, the UAV will return back to recharging station
after reaching 70% of its battery and the RL agent reaches the convergence
after 8000 episodes. From the figure we can see that the UAV has a flight
time of 360, 330 and 320[s] when moving at speed 10,20 and 30 [m/s],
respectively.
Fig. 6.7 summarizes the comparison results for UAV speed on three per-
formance metrics. The UAV speed is set from 10m to 40m. It can be seen
that when the UAV operates at lower speeds, since it consumes less energy
than other speed variations and the flight time is the highest, it can achieve
the highest number of transmitted bits. On the other hand, when the UAV
move at higher speeds, it consumes the largest energy based on the adop-
ted propulsion power consumption model and so lowest flight time and low
number of transmitted bits. However, since it is moving at higher speed, it
travels the longest path which means RSSI samples from different positions
that results to better localization error. From the figures, it is clear that with
limited energy the localization error reduces along with increasing the UAV
speed, but the number of transmitted bits lessened.
In Fig. 6.8, we evaluate the performance of RL approach by varying
the weights in the reward function from (6.25). For this purpose, we tes-
ted different weight numbers for communication rate and localization error
rewards. Here, we chose two set of weights values (W1 and W2 ) that can
capture the impact of reward function in the trade-off between communica-
tion and localization. W1 corresponds to the scenario when the weight of the
communication reward is larger than localization, and W2 is for the case in
which the weight of the localization is larger than communication. Figure.
82
400
400
350
350
300
Number of transmitted bits
100 150
100
50 100
W1 W1 W1
0 W2 0 W2 50
W2
0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 12000 14000 16000
Episodes Episodes Episodes
160 W1 W1 W1
W2 80 W2 W2
140 50
120 70
Localization error [m]
80 30
50
60
20
40 40
20 30 10
0
0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 12000 14000 16000
Episodes Episodes Episodes
320
430 330
300
420 328
280
410
326
260
Flight time [s]
40
35
Localization error [m]
30
25
20
15
10
7a plot the number of transmitted signals, localization error and flight time
during each training session when the uav speed is set to 10[m/s]. It can
be seen that for the case of W1 , after convergence, the UAV achieves higher
transmitted bits during an episode in comparison with W2 . However, in W2
83
case, the UAV archives better localization performance than W1 . The flight
time difference between these two cases capture the fact that when weight
for communication reward is larger than localization reward, the UAV tends
to hover more than moving to different spots which means that the UAV
found the optimal spot for giving service to ground users and hover at that
position to maximize the system throughput.
In Fig. 6.9, we show the trade-off between communication and sensing
with respect to discussion in previous sections on multi objective optimiza-
tion of localization and system throughput. The figure plots the localization
error and number of transmitted bits resulted from different UAV speeds and
multi objective weights in (6.25). In fact, whenever we increase the trans-
mitted bits, the localization error decreases. Thus, we can see the trade-off
between these two optimization objectives. To achieve any specific system
performance, we can modify the weight values WR and WL in (6.25) and
UAV speed to achieve desirable communication throughput and localization
error.
6.7 . Conclusion
84
7 - Conclusion
85
This chapter highlights general conclusions of the thesis and summaries
possible directions for future work.
Sommaire
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . 86
7.2 Future Work . . . . . . . . . . . . . . . . . . . 88
7.2.1 Machine Learning-aided Wireless Networks . 88
7.2.2 Federated Learning in Future Networks . . . 90
7.2.3 Machine Learning for Reconfigurable Intelli-
gent Surfaces . . . . . . . . . . . . . . . . . . 91
7.1 . Conclusion
86
presented to highlight how our technique strike a balance between
the throughput achieved, trajectory, and the complexity.
87
7.2 . Future Work
88
in terms of the computation capability and hardware design. For this purpose,
by adding a confidence score to predictions and a scale factor to the genera-
ted actions, the future developments can improve the processing time of the
learning algorithms. Additionally, one could integrate various ML techniques
in order to cooperatively complete the prediction procedure and therefore
improve the computational efficiency. Aside from containing a larger num-
ber of samples for optimization, it is also worthwhile to derive the optimal
parameters of the learning algorithms to achieve faster convergence. For the
special scenarios of UAV swarms consisting of micro-drones with restricted
capabilities, the DL methods can run on a traditional base station (BS) with
high computational power that will function as a central manager connec-
ted to the UAV mesh network. To determine the best possible action, this
base station will rely on the sensor data from all of the drones. However,
this control approach is not the optimal choice, because it usually introduces
more signaling overhead and transmission latency as a consequence of the
necessary information exchange between the BS and the UAVs.
89
plied by UAV BS and energy that they utilize for recharging from the gird.
Consequently, in cases where multiple UAVs are installed to serve as aerial
BSs, the joint consideration of physical-layer parameters and energy and the
application of ML algorithms, such as DL to process heterogeneous data, can
grant an increased performance, as network lifetime prolongation is a crucial
feature of UAV networks.
7.2.2 . Federated Learning in Future Networks
One important factor that should not be overlooked is the fact that FL
is not necessarily applied only for UAV or mobile user networks, instead, it
is being used successfully in many daily applications. For instance, Google
implements the FL to learn a RNN to predict our next word when we start
typing on the keyboard. Nonetheless, it should be pointed out that it is not
certain how to select specific parameters in the FL algorithm. For instance,
the client selection process has been defined as random, which raises the
question of whether there is a superior approach to assign clients in each
round of the FL algorithm. The aforementioned issue requires more investi-
gation in depth for UAV networks where several parameters can affect the
client selection process. From a wireless communication point of view, chan-
nel quality, LoS/NLoS link, available data, and battery state are important
factors that can substantially impact the client selection process. In parti-
cular, those parameters can make a subgroup of users more suitable to be
chosen for the FL training. Moreover, although a great part of the research
community argue that the main goal of FL is data confidentiality, others
question this assumption and state that even sharing only updates over the
wireless network is not secure. In some part that is true, since the FL can
be subject to a virulent attack threatening the integrity of the model. These
kinds of attacks are popular in the ML community by backdoor attacks and
are generally executed either by a single node or by a group of nodes in-
fuse wrong data into the model to negatively influence it. Above all, even
FL stays vulnerable to this category of attacks not by sending wrong data
but by infecting the model itself by some malicious clients. In future, as a
advanced solution to the unreliability of FL systems, we suggest to aid drone
networks using Blockchain methods to increase the integrity of local models
at each UAV. The combination of Blockchain and FL is examined as a major
breakthrough and a number of recent research works have begun to study
this topic.
It has been stated that in addition to the increased level of stability
and integrity, the Blockchain method can boost the users motivation to
participate in training by precisely rewarding them for their contribution.
Recently, the research community has begun to implement the concept of
a Blockchain combined with FL to propose solutions for drone networks.
For instance, secure FL framework can be applied to mobile crowdsensing
90
aided by a UAV-network. The local model exchanges of the FL algorithm
can be secured with respect to a Blockchain architecture. In summary, we
emphasize the potential of coupling Blockchain and FL in future works. Aside
from the security issues, more focus should be given to the convergence of a
FL algorithm which is not always guaranteed. Convergence depends on the
particular class of problem, such as the convexity of the loss function and the
number of updates performed on the model. For instance, the optimization
of the overall model will fail if we pick wrong clients that are not available
or do not have enough data. It should be noted that this problem overlaps
with the client selection issue mentioned previously and it is associated with
client selection and also to the type of the loss function.
Furthermore, as we proposed FL as a solution to train ML model on mul-
tiple UAVs in different environment settings, we should also mention that
the massive exchange of updates across the network will bring in a huge
amount of communication loads in the training phase, specifically for neural
networks, which will induce a scalability problem for FL. Many CNN ar-
chitectures demand a large number of parameters to be updated at each
round. In fact drone networks are generally characterized by a restricted bat-
tery capacity and limited bandwidth, which makes the UAVs unable to sup-
port all these communication loads. To solve this problem, many researchers
have been working on alternatives and approaches that could improve me-
mory consumption and communication efficiency by proposing compression
techniques and reducing the number of communication rounds. However, a
drawback of FL starts to appear when operating in a heterogeneous UAV
network formed by various types of UAVs, rotary or fixed-wing, with different
processing capabilities and different GPUs. These dissimilarities mean that
some drones will have fast response times while others will experience severe
delays. Consequently, these induced delays will cause an important issue by
significantly slowing down convergence since the FL algorithm is anticipa-
ted to receive the required model updates at each communication round. In
future, a distributed computation scheme can be introduced to reduce the
influence of slow nodes on convergence for gradient methods. Additionally,
the quality of connectivity can impact the convergence of the FL algorithm
due to the fact that several network nodes may encounter an unexpected
failure when transmitting their local updates. These interruptions can also
reduce the overall performance of the FL by slowing the convergence speed
which should be investigated.
7.2.3 . Machine Learning for Reconfigurable Intelligent Surfaces
Next-generation wireless networks should deal with a growing density of
mobile users while accommodating a swift rise in mobile data traffic flow and
a wide range of services and applications. In future networks, high-frequency
waves will act as an curcial role, but these signals are regularly obstructed
91
by objects and diminish over long distances. Reconfigurable intelligent sur-
faces (RISs) is a promising solution due to their potential to improve wireless
network capacity and coverage by intelligently changing the wireless propa-
gation environment. Therefore, RISs carry out a potential technology for the
sixth generation of communication networks. In fact, for maximizing the pos-
sible advantages of RIS-assisted communication systems, ML is an effective
method when the computational complexity of operating and deploying RIS
increases rapidly as the number of interactions between the user and the
infrastructure starts to expand. Considering the fact that ML is a promising
approach for improving the network and its performance, the application of
ML in RISs is anticipated to open new paths for interdisciplinary studies as
well as practical applications.
It should be noted that some certain challenges must be addressed before
obtaining the advantages of RISs. Accurate channel state information (CSI)
for optimum reflection on the RIS is mandatory. It is very demanding for a
realistic RIS-empowered wireless network to obtain a precise value for CSI
on a continuous basis because of capacity in flexibility of the served client
and the obstruction-prone character of the signal. Thus, the issues of CSI
evaluation and optimization of network performance under weak CSI must
be accordingly addressed to permit a real-time and effective RIS-assisted
transmission. Channel assessment complexity is high in RIS-assisted wireless
networks due to the considerable number of components been used, which
is a major challenge. Furthermore, gaining channel knowledge may need a
extensive training overhead. Moreover, the phase shift of the reflecting ele-
ments complicates the designing of an ideal passive beamforming system, and
the conventional methods demand complicated procedures for the configura-
tion of the RIS which is both power and time consuming. As a consequence
of their ability to learn and the requirement of operating over wider search
areas, ML approaches have attracted attention in wireless communications,
particularly in the field of RISs. In the future, scholars must attempt to
overcome these obstacles. They can utilize various ML algorithms for the
communication sector so that the infrastructure can independently solve all
challenges. The majority of ML techniques function by learning the parame-
ters and constructing an optimization model from the input information for
the goal function. In our present time, since a large amount of data must
be handled, the efficiency and effectiveness of mathematical optimization
procedures substantially affect the popularity and application of ML models.
92
Bibliographie
[1] A. Shahbazi and M. Di Renzo, “Analysis of optimal altitude for uav
cellular communication in presence of blockage,” in 2021 IEEE 4th 5G
World Forum (5GWF), pp. 47–51, IEEE, 2021.
[2] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Efficient deploy-
ment of multiple unmanned aerial vehicles for optimal wireless cove-
rage,” IEEE Communications Letters, vol. 20, no. 8, pp. 1647–1650,
2016.
[3] M. Mozaffari, W. Saad, M. Bennis, Y.-H. Nam, and M. Debbah, “A tu-
torial on uavs for wireless networks : Applications, challenges, and open
problems,” IEEE communications surveys & tutorials, vol. 21, no. 3,
pp. 2334–2360, 2019.
[4] J. Lyu, Y. Zeng, and R. Zhang, “Uav-aided offloading for cellular hots-
pot,” IEEE Transactions on Wireless Communications, vol. 17, no. 6,
pp. 3988–4001, 2018.
[5] A. Merwaday and I. Guvenc, “Uav assisted heterogeneous networks for
public safety communications,” in 2015 IEEE wireless communications
and networking conference workshops (WCNCW), pp. 329–334, IEEE,
2015.
[6] Y. Zeng, R. Zhang, and T. J. Lim, “Wireless communications with un-
manned aerial vehicles : Opportunities and challenges,” IEEE Commu-
nications Magazine, vol. 54, no. 5, pp. 36–42, 2016.
[7] Y.-H. Nam, M. S. Rahman, Y. Li, G. Xu, E. Onggosanusi, J. Zhang,
and J.-Y. Seol, “Full dimension mimo for lte-advanced and 5g,” in 2015
Information Theory and Applications Workshop (ITA), pp. 143–148,
IEEE, 2015.
[8] T. Lagkas, V. Argyriou, S. Bibi, and P. Sarigiannidis, “Uav iot framework
views and challenges : Towards protecting drones as “things”,” Sensors,
vol. 18, no. 11, p. 4015, 2018.
[9] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong,
“Caching in the sky : Proactive deployment of cache-enabled unman-
ned aerial vehicles for optimized quality-of-experience,” IEEE Journal
on Selected Areas in Communications, vol. 35, no. 5, pp. 1046–1061,
2017.
[10] M. Mozaffari, A. T. Z. Kasgari, W. Saad, M. Bennis, and M. Debbah,
“Beyond 5g with uavs : Foundations of a 3d wireless cellular network,”
IEEE Transactions on Wireless Communications, vol. 18, no. 1, pp. 357–
372, 2018.
93
[11] A. Al-Hourani, S. Kandeepan, and A. Jamalipour, “Modeling air-to-
ground path loss for low altitude platforms in urban environments,” in
2014 IEEE global communications conference, pp. 2898–2904, IEEE,
2014.
[12] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
et al., “Human-level control through deep reinforcement learning,” na-
ture, vol. 518, no. 7540, pp. 529–533, 2015.
[13] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
“Deterministic policy gradient algorithms,” in International conference
on machine learning, pp. 387–395, PMLR, 2014.
[14] M. Riedmiller, “Neural fitted q iteration–first experiences with a data
efficient neural reinforcement learning method,” in European conference
on machine learning, pp. 317–328, Springer, 2005.
[15] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong,
and J. C. Zhang, “What will 5g be ?,” IEEE Journal on selected areas
in communications, vol. 32, no. 6, pp. 1065–1082, 2014.
[16] V. W. Wong, R. Schober, D. W. K. Ng, and L.-C. Wang, Key techno-
logies for 5G wireless systems. Cambridge university press, 2017.
[17] I. Valiulahi and C. Masouros, “Multi-uav deployment for throughput
maximization in the presence of co-channel interference,” IEEE Internet
of Things Journal, vol. 8, no. 5, pp. 3605–3618, 2020.
[18] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Unmanned aerial
vehicle with underlaid device-to-device communications : Performance
and tradeoffs,” IEEE Transactions on Wireless Communications, vol. 15,
no. 6, pp. 3949–3963, 2016.
[19] M. Zhang, S. Fu, and Q. Fan, “Joint 3d deployment and power allocation
for uav-bs : A deep reinforcement learning approach,” IEEE Wireless
Communications Letters, 2021.
[20] C. Zhao, J. Liu, M. Sheng, W. Teng, Y. Zheng, and J. Li, “Multi-uav tra-
jectory planning for energy-efficient content coverage : A decentralized
learning-based approach,” IEEE Journal on Selected Areas in Commu-
nications, 2021.
[21] X. Liu, Y. Liu, and Y. Chen, “Reinforcement learning in multiple-uav
networks : Deployment and movement design,” IEEE Transactions on
Vehicular Technology, vol. 68, no. 8, pp. 8036–8049, 2019.
[22] C. Bettstetter, “Smooth is better than sharp : A random mobility model
for simulation of wireless networks,” in Proceedings of the 4th ACM
international workshop on Modeling, analysis and simulation of wireless
and mobile systems, pp. 19–27, 2001.
94
[23] A. Asahara, K. Maruyama, A. Sato, and K. Seto, “Pedestrian-movement
prediction based on mixed markov-chain model,” in Proceedings of the
19th ACM SIGSPATIAL international conference on advances in geo-
graphic information systems, pp. 25–33, 2011.
[24] J. Krumm and E. Horvitz, “Predestination : Inferring destinations from
partial trajectories,” in International Conference on Ubiquitous Compu-
ting, pp. 243–260, Springer, 2006.
[25] J. J.-C. Ying, W.-C. Lee, T.-C. Weng, and V. S. Tseng, “Semantic
trajectory mining for location prediction,” in Proceedings of the 19th
ACM SIGSPATIAL international conference on advances in geographic
information systems, pp. 34–43, 2011.
[26] R. S. Sutton and A. G. Barto, Reinforcement learning : An introduction.
MIT press, 2018.
[27] H. Hasselt, “Double q-learning,” Advances in neural information proces-
sing systems, vol. 23, pp. 2613–2621, 2010.
[28] D. P. Kingma and J. Ba, “Adam : A method for stochastic optimization,”
arXiv preprint arXiv :1412.6980, 2014.
[29] A. Dammann, G. Agapiou, J. Bastos, L. Brunelk, M. Garcia, J. Guillet,
Y. Ma, J. Ma, J. J. Nielsen, L. Ping, et al., “Where2 location aided
communications,” in European Wireless 2013 ; 19th European Wireless
Conference, pp. 1–8, VDE, 2013.
[30] S. Kuutti, S. Fallah, K. Katsaros, M. Dianati, F. Mccullough, and
A. Mouzakitis, “A survey of the state-of-the-art localization techniques
and their potentials for autonomous vehicle applications,” IEEE Internet
of Things Journal, vol. 5, no. 2, pp. 829–846, 2018.
[31] D. Ebrahimi, S. Sharafeddine, P.-H. Ho, and C. Assi, “Autonomous
uav trajectory for localizing ground objects : A reinforcement learning
approach,” IEEE Transactions on Mobile Computing, vol. 20, no. 4,
pp. 1312–1324, 2020.
[32] A. Al-Hourani, S. Kandeepan, and S. Lardner, “Optimal lap altitude for
maximum coverage,” IEEE Wireless Communications Letters, vol. 3,
no. 6, pp. 569–572, 2014.
[33] S. Niknam, H. S. Dhillon, and J. H. Reed, “Federated learning for wire-
less communications : Motivation, opportunities, and challenges,” IEEE
Communications Magazine, vol. 58, no. 6, pp. 46–51, 2020.
[34] U. Challita, A. Ferdowsi, M. Chen, and W. Saad, “Machine learning
for wireless connectivity and security of cellular-connected uavs,” IEEE
Wireless Communications, vol. 26, no. 1, pp. 28–35, 2019.
95
[35] B. Li, Z. Fei, and Y. Zhang, “Uav communications for 5g and beyond :
Recent advances and future trends,” IEEE Internet of Things Journal,
vol. 6, no. 2, pp. 2241–2263, 2018.
[36] A. Ryan, M. Zennaro, A. Howell, R. Sengupta, and J. K. Hedrick,
“An overview of emerging results in cooperative uav control,” in 2004
43rd IEEE Conference on Decision and Control (CDC)(IEEE Cat. No.
04CH37601), vol. 1, pp. 602–607, IEEE, 2004.
[37] F. Liu, Y. Cui, C. Masouros, J. Xu, T. X. Han, Y. C. Eldar, and S. Buzzi,
“Integrated sensing and communications : Towards dual-functional wi-
reless networks for 6g and beyond,” arXiv preprint arXiv :2108.07165,
2021.
[38] A. Hassanien, M. G. Amin, Y. D. Zhang, and F. Ahmad, “Dual-
function radar-communications : Information embedding using sidelobe
control and waveform diversity,” IEEE Transactions on Signal Proces-
sing, vol. 64, no. 8, pp. 2168–2181, 2015.
[39] F. Liu, L. Zhou, C. Masouros, A. Li, W. Luo, and A. Petropulu, “Toward
dual-functional radar-communication systems : Optimal waveform de-
sign,” IEEE Transactions on Signal Processing, vol. 66, no. 16, pp. 4264–
4279, 2018.
[40] F. Liu, L. Zhou, C. Masouros, A. Lit, W. Luo, and A. Petropulu, “Dual-
functional cellular and radar transmission : Beyond coexistence,” in 2018
IEEE 19th International Workshop on Signal Processing Advances in
Wireless Communications (SPAWC), pp. 1–5, IEEE, 2018.
[41] F. Liu, C. Masouros, A. Li, H. Sun, and L. Hanzo, “Mu-mimo com-
munications with mimo radar : From co-existence to joint transmis-
sion,” IEEE Transactions on Wireless Communications, vol. 17, no. 4,
pp. 2755–2770, 2018.
[42] X. Wang, Z. Fei, J. A. Zhang, J. Huang, and J. Yuan, “Constrained uti-
lity maximization in dual-functional radar-communication multi-uav net-
works,” IEEE Transactions on Communications, vol. 69, no. 4, pp. 2660–
2672, 2020.
[43] F. Liu, Y.-F. Liu, A. Li, C. Masouros, and Y. C. Eldar, “Cramr-rao
bound optimization for joint radar-communication beamforming,” IEEE
Transactions on Signal Processing, 2021.
[44] C. Sturm, T. Zwick, and W. Wiesbeck, “An ofdm system concept for
joint radar and communications operations,” in VTC Spring 2009-IEEE
69th Vehicular Technology Conference, pp. 1–5, IEEE, 2009.
[45] J. A. Zhang, X. Huang, Y. J. Guo, J. Yuan, and R. W. Heath, “Multi-
beam for joint communication and radar sensing using steerable analog
antenna arrays,” IEEE Transactions on Vehicular Technology, vol. 68,
no. 1, pp. 671–685, 2018.
96
[46] Y. Luo, J. A. Zhang, W. Ni, J. Pan, and X. Huang, “Constrained mul-
tibeam optimization for joint communication and radio sensing,” in
2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6,
IEEE, 2019.
[47] X. Wang, A. Hassanien, and M. G. Amin, “Dual-function mimo ra-
dar communications system design via sparse array optimization,” IEEE
Transactions on Aerospace and Electronic Systems, vol. 55, no. 3,
pp. 1213–1226, 2018.
[48] C. Shi, F. Wang, S. Salous, and J. Zhou, “Joint subcarrier assignment
and power allocation strategy for integrated radar and communications
system based on power minimization,” IEEE Sensors Journal, vol. 19,
no. 23, pp. 11167–11179, 2019.
[49] A. Zanella, “Best practice in rss measurements and ranging,” IEEE Com-
munications Surveys & Tutorials, vol. 18, no. 4, pp. 2662–2686, 2016.
[50] A. Wang, X. Ji, D. Wu, X. Bai, N. Ding, J. Pang, S. Chen, X. Chen,
and D. Fang, “Guideloc : Uav-assisted multitarget localization system
for disaster rescue,” Mobile Information Systems, vol. 2017, 2017.
[51] T. Tomic, K. Schmid, P. Lutz, A. Domel, M. Kassecker, E. Mair, I. L.
Grixa, F. Ruess, M. Suppa, and D. Burschka, “Toward a fully autono-
mous uav : Research platform for indoor and outdoor urban search and
rescue,” IEEE robotics & automation magazine, vol. 19, no. 3, pp. 46–
56, 2012.
[52] C. Liu, D. Fang, Z. Yang, H. Jiang, X. Chen, W. Wang, T. Xing, and
L. Cai, “Rss distribution-based passive localization and its application
in sensor networks,” IEEE Transactions on Wireless Communications,
vol. 15, no. 4, pp. 2883–2895, 2015.
[53] S. Tomic, M. Beko, and R. Dinis, “Rss-based localization in wireless
sensor networks using convex relaxation : Noncooperative and coope-
rative schemes,” IEEE Transactions on Vehicular Technology, vol. 64,
no. 5, pp. 2037–2050, 2014.
[54] T. Stoyanova, F. Kerasiotis, C. Antonopoulos, and G. Papadopoulos,
“Rss-based localization for wireless sensor networks in practice,” in 2014
9th International Symposium on Communication Systems, Networks &
Digital Sign (CSNDSP), pp. 134–139, IEEE, 2014.
[55] D. Koutsonikolas, S. M. Das, and Y. C. Hu, “Path planning of mo-
bile landmarks for localization in wireless sensor networks,” Computer
Communications, vol. 30, no. 13, pp. 2577–2592, 2007.
[56] J. Rezazadeh, M. Moradi, A. S. Ismail, and E. Dutkiewicz, “Superior
path planning mechanism for mobile beacon-assisted localization in wi-
reless sensor networks,” IEEE Sensors Journal, vol. 14, no. 9, pp. 3052–
3064, 2014.
97
[57] J. Jiang, G. Han, H. Xu, L. Shu, and M. Guizani, “Lmat : Localiza-
tion with a mobile anchor node based on trilateration in wireless sen-
sor networks,” in 2011 IEEE Global Telecommunications Conference-
GLOBECOM 2011, pp. 1–6, IEEE, 2011.
[58] R. Sumathi and R. Srinivasan, “Rss-based location estimation in mobi-
lity assisted wireless sensor networks,” in Proceedings of the 6th IEEE
International Conference on Intelligent Data Acquisition and Advanced
Computing Systems, vol. 2, pp. 848–852, IEEE, 2011.
[59] X. Zhang, Z. Duan, L. Tao, and D. K. Sung, “Localization algorithms
based on a mobile anchor in wireless sensor networks,” in 2014 23rd
International Conference on Computer Communication and Networks
(ICCCN), pp. 1–6, IEEE, 2014.
[60] Z. Gong, C. Li, F. Jiang, R. Su, R. Venkatesan, C. Meng, S. Han,
Y. Zhang, S. Liu, and K. Hao, “Design, analysis, and field testing of an
innovative drone-assisted zero-configuration localization framework for
wireless sensor networks,” IEEE Transactions on Vehicular Technology,
vol. 66, no. 11, pp. 10322–10335, 2017.
[61] P. Perazzo, F. B. Sorbelli, M. Conti, G. Dini, and C. M. Pinotti, “Drone
path planning for secure positioning and secure position verification,”
IEEE Transactions on Mobile Computing, vol. 16, no. 9, pp. 2478–2493,
2016.
[62] C. M. Pinotti, F. Betti Sorbelli, P. Perazzo, and G. Dini, “Localiza-
tion with guaranteed bound on the position error using a drone,” in
Proceedings of the 14th ACM International Symposium on Mobility
Management and Wireless Access, pp. 147–154, 2016.
[63] F. B. Sorbelli, S. K. Das, C. M. Pinotti, and S. Silvestri, “Precise loca-
lization in sparse sensor networks using a drone with directional anten-
nas,” in Proceedings of the 19th International Conference on Distributed
Computing and Networking, pp. 1–10, 2018.
[64] F. B. Sorbelli, S. K. Das, C. M. Pinotti, and S. Silvestri, “Range based
algorithms for precise localization of terrestrial objects using a drone,”
Pervasive and Mobile Computing, vol. 48, pp. 20–42, 2018.
[65] F. Demiane, S. Sharafeddine, and O. Farhat, “An optimized uav trajec-
tory planning for localization in disaster scenarios,” Computer Networks,
vol. 179, p. 107378, 2020.
[66] G. Afifi and Y. Gadallah, “Autonomous 3-d uav localization using cel-
lular networks : Deep supervised learning versus reinforcement learning
approaches,” IEEE Access, vol. 9, pp. 155234–155248, 2021.
[67] M. Atif, R. Ahmad, W. Ahmad, L. Zhao, and J. J. Rodrigues, “Uav-
assisted wireless localization for search and rescue,” IEEE Systems Jour-
nal, 2021.
98
[68] N. A. Alrajeh, M. Bashir, and B. Shams, “Localization techniques in wi-
reless sensor networks,” International journal of distributed sensor net-
works, vol. 9, no. 6, p. 304628, 2013.
[69] H. Sallouha, M. M. Azari, and S. Pollin, “Energy-constrained uav tra-
jectory design for ground node localization,” in 2018 IEEE Global Com-
munications Conference (GLOBECOM), pp. 1–7, IEEE, 2018.
99
100
8 - Synthèse en français
Ces dernières années, des progrès rapides ont été réalisés dans la concep-
tion et l’amélioration des véhicules aériens sans pilote (drone) de différentes
tailles, formes et leurs capacités de communication. Les drones peuvent se
déplacer de manière autonome grâce à des microprocesseurs connectés ou
peuvent être exploités à distance sans nécessiter de personnel humain. En
raison de leur adaptabilité, de leur installation facile, de leurs faibles coûts
de maintenance, de leur polyvalence et de leurs coûts d’exploitation relative-
ment faibles, l’utilisation de drones prend en charge de nouvelles voies pour
les applications commerciales, militaires, civiles, agricoles et environnemen-
tales telles que la surveillance des frontières, le relais pour les réseaux ad hoc,
la gestion des incendies de forêt, la surveillance des catastrophes, l’estima-
tion du vent, la surveillance du trafic, la télédétection et les opérations de
recherche et de destruction. Beaucoup de ces applications nécessitent un seul
système drone et d’autres comme la surveillance de zone pour les environne-
ments dangereux exigent des systèmes multi- drone. Bien que les systèmes
de drones uniques soient utilisés depuis des décennies, en fonctionnant et
en développant un grand drone, l’exploitation d’un ensemble de petits drone
présente de nombreux avantages. Chaque drone agit comme un nœud isolé
dans les systèmes drone uniques, il ne peut communiquer qu’avec le nœud
au sol. Par conséquent, le système de communication drone est établi uni-
quement via une communication drone -infrastructure, et la communication
entre les drone peut être basée sur l’infrastructure. La capacité d’un seul
système drone est limitée par rapport au système multi drone qui présente
de nombreux avantages. D’abord et avant tout, les tâches sont principale-
ment accomplies à moindre coût avec les systèmes multi- drone. De plus, le
travail collaboratif des drones peut améliorer les performances du système.
De plus, si drone échoue dans une mission dans un système multi- drone,
l’opération peut continuer à exister avec les autres drone, et les tâches sont
généralement terminées plus rapidement et efficacement avec les systèmes
multi- drone.
Dans cette thèse, de nouvelles contributions sur la modélisation, l’évalua-
tion et l’optimisation de la prochaine génération de systèmes de communica-
tion de drone ont été rapportées. En particulier, la technologie émergente de
Apprentissage Automatique (AA), en tant que catalyseur prometteur pour
les communications sans fil au-delà de la 5G, a été inspectée et utilisée. Plus
précisément, la contribution de cette thèse peut être résumée comme suit.
Dans les premiers chapitres, nous avons fourni une étude approfondie sur
l’utilisation des drone dans les réseaux sans fil. Nous avons étudié les prin-
cipaux cas d’utilisation des drones en tant que stations de base aériennes
101
et utilisateurs connectés au cellulaire. Pour chacune des applications, nous
avons exploré les défis clés et les problèmes fondamentaux. De plus, nous
avons couvert en détail les nouvelles directions de recherche lorsque les tech-
niques AA sont utilisées pour augmenter les performances des réseaux drone.
Nous avons fourni un aperçu complet des techniques AA, en particulier Ap-
prentissage par renforcement (AR), qui ont été appliquées dans les réseaux
dronee. Ensuite, nous avons discuté des principes et des avantages Appren-
tissage Fédéré (AF) et où une approche AF peut être utilisée dans le domaine
des réseaux drone.
Dans l’un de nos principaux travaux, nous avons conçu un nouveau sys-
tème de communication assisté par drone reposant sur la trajectoire de vol la
plus courte de l’drone tout en maximisant la quantité de données transmises
aux appareils mobiles. Dans le système considéré, nous avons supposé que
l’drone n’a pas connaissance de l’emplacement de l’utilisateur à l’exception
de sa position initiale. Nous avons proposé un cadre basé sur la probabilité
de présence d’utilisateurs mobiles dans une grille par rapport à leur distribu-
tion de probabilité. Ensuite, une technique d’apprentissage par renforcement
profond est développée pour trouver la trajectoire afin de maximiser le dé-
bit dans une zone de couverture spécifique. Des résultats numériques ont
été présentés pour mettre en évidence comment notre technique établit un
équilibre entre le débit atteint, la trajectoire et la complexité. Contrairement
aux travaux précédents, nous avons étudié la localisation des utilisateurs
au sol en utilisant des drones comme ancres aériennes. Plus précisément,
nous avons introduit un nouveau cadre de localisation basé sur AF et AR.
Contrairement à la littérature existante, notre scénario comprend plusieurs
drone apprenant la trajectoire dans différents environnements, ce qui se tra-
duit par une convergence plus rapide du modèle AR pour une erreur de
localisation minimale. De plus, pour évaluer la trajectoire apprise à partir du
modèle agrégé, nous testons l’agent AR formé dans un quatrième environne-
ment qui montre l’amélioration de l’erreur de localisation et de la vitesse de
convergence. Les résultats de la simulation montrent que notre cadre proposé
surpasse un modèle formé avec l’apprentissage par transfert de %30.
Enfin, nous avons exploré la trajectoire optimale pour maximiser le débit
de communication et minimiser les erreurs de localisation dans un réseau
drone où un seul drone dessert un groupe d’utilisateurs de communication
et localise les cibles au sol simultanément. Pour équilibrer les performances
de communication et de localisation, nous avons formulé un problème d’op-
timisation multi-objectifs pour optimiser conjointement deux objectifs : la
maximisation du nombre de bits transmis envoyés aux utilisateurs et la mi-
nimisation de l’erreur de localisation pour les cibles au sol sur une période
de mission particulière qui est limitée par l’énergie du drone. consommation
ou temps de vol. Ces deux objectifs étaient partiellement en conflit l’un avec
102
l’autre et des paramètres de pondération sont donnés pour décrire l’impor-
tance associée. Par conséquent, dans ce contexte, nous avons proposé un
nouveau cadre basé sur AR pour permettre au drone de trouver sa trajec-
toire de manière autonome, ce qui améliore la précision de localisation et
maximise le nombre de bits transmis dans les plus brefs délais par rapport à
la consommation d’énergie du drone. Nous avons démontré que la méthode
proposée améliore significativement la moyenne des bits transmis, ainsi que
l’erreur de localisation du réseau.
103