2024TLSES121
2024TLSES121
Composition du jury
M. Eric Marie FERON, Président, King Abdullah University of Science and Technology
M. Duc-Dung NGUYEN, Rapporteur, Vietnam Academy of Science and Technology
Mme Valérie GIRARDIN, Rapporteure, Université Caen Normandie
M. Denis KOUAMÉ, Examinateur, Université Toulouse III - Paul Sabatier
M. Pierre MARÉCHAL, Directeur de thèse, Université Toulouse III - Paul Sabatier
Mme Thi-Lan LE, Co-directrice de thèse, Hanoi University of Science and Technology
Membres invités
M. Daniel Delahaye, Ecole Nationale de l'Aviation Civile
THÈSE
En vue de l’obtention du
JURY
Valérie Girardin Professeure des Universités Rapporteur
Duc-Dung Nguyen Maı̂tre de Conférences Rapporteur
Eric Marie Feron Professeur des Universités Examinateur
Denis Kouamé Professeur des Universités Examinateur
Daniel Delahaye ENAC Membre invité
Pierre Maréchal Professeur des Universités Directeur de Thèse
Thi-Lan Le Maı̂tresse de Conférences Codirectrice de Thèse
i
Dans divers scénarios et conditions de vol, ce modèle surpasse systématique-
ment les modèles de base en termes de précision de prédiction de trajectoire.
L’utilisation de l’optimisation bayésienne rationalise non seulement le réglage
des hyperparamètres, mais donne également des résultats prometteurs en ter-
mes d’amélioration des performances prédictives.
De plus, dans l’estimation de la complexité du trafic aérien, la recon-
struction des fonctions de densité de probabilité angulaire est cruciale car
elle permet de modéliser et d’évaluer la répartition spatiale du flux de trafic
aérien. Cette approche permet d’évaluer la congestion de l’espace aérien et
de prévoir des schémas de trafic complexes, facilitant ainsi des stratégies
de gestion du trafic aérien plus efficaces. Dans cette thèse, nous proposons
d’utiliser le principe d’entropie maximale pour générer des cartes détaillées
de complexité du trafic à partir de données de trajectoires, en exploitant les
propriétés statistiques des coefficients de Fourier. La méthode est validée
par des simulations numériques, démontrant sa capacité à représenter avec
précision des distributions angulaires de scénarios simples à complexes.
À l’aide de données historiques sur les vols et le trafic aérien, la re-
construction des fonctions de densité de probabilité angulaire par maximum
d’entropie est proposée, puis utilisée pour construire des cartes de complexité
du trafic aérien. La géométrie de l’information est à la base de l’évaluation de
la complexité locale. En discrétisant l’espace aérien et en appliquant la du-
alité de Fenchelpour reconstruire les densités angulaires, nous obtenons une
densité locale dans la classe de densité de Von Mises Généralisées. L’indice de
complexité local est alors calculés sur la base des distances géodésiques entre
la distribution obtenue et la distribution uniforme (qui est bien sûr associée
à la plus grande compliexité). L’utilisation de la divergence Kullback-Leibler
symétrisée garantit l’efficacité des calculs, permettant une application pra-
tique dans les systèmes ATM. Ces cartes de complexité fournissent des in-
formations essentielles sur les modèles de trafic et les réponses systémiques,
soutenant la gestion stratégique.
En conclusion, cette thèse propose des approches efficaces pour résoudre
certains problèmes en ATM, notamment en matière de prédiction de trajec-
toire de vol et d’évaluation de la complexité du trafic aérien.
Mots clés: Gestion du trafic aérien, Complexité du trafic aérien, Réseaux
de neuronnes, Prédiction des trajectoire, Géométrie de l’information, En-
tropie maximale.
ii
Abstract
In the context of air traffic management (ATM), the increasing demand for
airspace has exerted substantial pressure on existing systems. This grow-
ing demand necessitates the urgent development and implementation of en-
hanced safety measures and optimized airspace efficiency to accommodate
escalating traffic volumes. The complexity of air traffic is determined by
the real-time flow of traffic within a given airspace, including the number of
flight and their relative positions. Due to the constraints of airspace design,
flight must alter their trajectories to fit within these boundaries, resulting
in ongoing changes in traffic complexity. As a result, precisely measuring
the dynamic complexity of air traffic is a challenging task. This thesis aims
to explore the difficulties in flight trajectory prediction and air traffic com-
plexity estimation by leveraging machine learning methodologies, Maximum
Entropy principle and information geometry.
Predicting flight trajectories is essential in aviation, as it supports effi-
cient air traffic management and ensures safe and smooth flight operations.
Traditional prediction methods often fall short in capturing the complex
spatio-temporal dependencies and uncertainties specific to the aviation sec-
tor. We propose a novel model named BayesCoordLSTM model. This hybrid
model integrates coordinate transformation and Bayesian method into Con-
vLSTM models. By combining the spatial features extracted by the Con-
volutional Neural Network (CNN) with the temporal dependencies identi-
fied by Long Short-Term Memory (LSTM), the model significantly enhances
trajectory prediction accuracy. The incorporation of Bayesian method of-
fers probabilistic trajectory forecasts with confidence levels, while coordinate
transformation improves spatial awareness and predictive performance. We
use real flight dataset to implement, our results demonstrate the remarkable
superiority of the BayesCoordLSTM model over existing methods. Across
various flight scenarios and conditions, this model consistently outperforms
baseline models in terms of trajectory prediction accuracy. The utilization of
Bayesian optimization not only streamlines hyperparameter tuning but also
yields promising results in terms of predictive performance enhancement.
iii
In addition, in air traffic complexity estimation, reconstructing angu-
lar probability density functions is crucial as it helps model and assess the
spatial distribution of air traffic flow. This approach helps assess airspace
congestion and predict complex traffic patterns, facilitating more effective
air traffic management strategies. In this dissertation, we propose to use the
Maximum Entropy principle generating detailed traffic complexity maps from
trajectory data, leveraging the statistical properties of Fourier coefficients.
The method is validated through numerical simulations, demonstrating its
ability to accurately represent angular distributions from simple to complex
scenarios.
Using historical flight and airspace traffic data, reconstructing angular
probability density functions improves predictions of congestion areas and
complexity levels. And then, we illustrate a novel methodology for con-
structing air traffic complexity maps is developed using information geom-
etry. By discretizing the airspace and applying the Generalized von Mises
distribution (GvM) alongside Fenchel duality, local traffic complexity indices
are calculated based on geodesic distances between distributions. The use of
symmetrized Kullback-Leibler divergence ensures computational efficiency,
enabling practical application in ATM systems. These complexity maps pro-
vide critical insights into traffic patterns and systemic responses, supporting
strategic management.
In conclusion, this dissertation propose efficient approaches to solve some
problems in ATM, especially in flight trajectory prediction, and air traffic
complexity assessment.
iv
Acknowledgments
I would like to express my deepest and most heartfelt gratitude to those who
have accompanied and supported me throughout the challenging journey of
completing my PhD. This process has been an experience filled with both
intellectual and emotional trials, and during the most difficult moments, I
have been fortunate to receive unwavering encouragement and support from
many people.
First and foremost, I extend my profound thanks to my advisors, Prof.
Pierre Maréchal and Assoc. Prof. Thi-Lan Le. They have been more than
supervisors; they have been true mentors and steadfast companions on this
journey. Their guidance, encouragement, and belief in my work helped me
navigate every obstacle and bring this thesis to completion.
I am deeply grateful to my referees, Prof. Valérie Girardin and As-
soc. Prof. Duc-Dung Nguyen. Their thorough evaluation of my work, their
insightful critique, and their detailed, thoughtful feedback have had an im-
mense impact not only on this thesis but on shaping my future work. Their
contributions have been invaluable, and I am deeply thankful for the time
and effort they devoted to this process.
I also sincerely appreciate the other esteemed members of my thesis jury,
Prof. Eric Marie Feron, Prof. Denis Kouamé, and Prof. Daniel Delahaye,
for their time, insightful comments, and constructive suggestions, which have
enriched my work and inspired new ideas for future endeavors.
Special thanks to Prof. Stéphane Puechmorel and Assoc. Prof. Andrija
Vidosavljevic for their invaluable support and mentorship. I would also like
to thank my friends in Z108/Z109, who have shared countless memorable
moments over coffee and lunch breaks, bringing joy and balance to my days.
I am deeply thankful to ENAC (École Nationale de l’Aviation Civile),
University of Toulouse III - Paul Sabatier and HUST (Hanoi University of
Science and Technology) for providing exceptional research facilities, and
particularly to ENAC for the financial support that made this work possible.
v
I extend my appreciation to all members of the Computer Vision De-
partment, MICA Research Institute, HUST for their willingness to assist and
offer guidance whenever needed.
I would also like to thank my Vietnamese friends in France: Thinh, Nga,
Hung and Lam, who stood by me during the intense pressures of thesis com-
pletion. My heartfelt thanks go to my colleagues at Thuongmai University
for their support.
A special note of gratitude goes to the families of Christine, Anne, and
An. Their kindness and hospitality made me feel like a part of their families,
providing warmth and comfort during some of the most difficult times.
Lastly, but most importantly, I express my deepest love and appreciation
to my parents, my husband Khiem, and my children Minh and Ngoc. Their
unconditional love, patience, and belief in me have been the foundation upon
which I have built this achievement. I am also grateful to my sisters for their
constant understanding and companionship on this long journey.
Each of these people has played an irreplaceable role in my life, and I
will forever cherish their support and kindness.
vi
Contents
Résumé i
Abstract iii
Acknowledgments v
Abbreviations xv
Publications xvii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Air traffic complexity . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Factors contributing to air traffic complexity . . . . . . 11
1.3.2 Air traffic complexity metrics . . . . . . . . . . . . . . 12
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . 18
vii
2.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . 30
2.3.1 LSTM architecture . . . . . . . . . . . . . . . . . . . . 30
2.3.2 LSTM operation . . . . . . . . . . . . . . . . . . . . . 31
2.4 Proposed model . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Coordinate transformation . . . . . . . . . . . . . . . . 34
2.4.2 CNN layer . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.3 Bayesian optimization . . . . . . . . . . . . . . . . . . 36
2.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.1 Data preparation . . . . . . . . . . . . . . . . . . . . . 39
2.5.2 Hyperparameter tuning . . . . . . . . . . . . . . . . . . 42
2.5.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . 42
2.5.4 Comparison of experimental results . . . . . . . . . . . 43
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
viii
3.5.1 Validation with Dirac distributions . . . . . . . . . . . 70
3.5.2 Numerical implementations . . . . . . . . . . . . . . . 72
3.6 Towards air traffic complexity estimation . . . . . . . . . . . . 79
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
ix
6.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
References 138
Appendix
A Computing conjugates . . . . . . . . . . . . . . . . . .
x
List of Figures
xi
3.6 Results on the Original and Optimal densities with two peaks. 74
3.7 Results on the Original and Optimal densities with five peaks. 75
3.8 Comparison of computational time by seven optimal models
with five peaks. . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1 Plots of εψβ (η) and its conjugate for β = 1 and various values
of ε. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
xii
List of Tables
xiii
xiv
Abbreviations
Abbreviation Meaning
ADS-B Automatic Dependent Surveillance Broadcast
ATCOs Air Traffic Controllers
ATFM Air Traffic Flow Management
ATL ATLanta International Airport
ATM Air Traffic Management
BCNN Bayesian Convolutional Neural Network
BFGS Broyden Fletcher Goldfarb Shann
BNN Bayesian Neural Network
CCI Composite Complexity Index
CG Conjugate Gradient
CG3D CNN-GRU with a 3Dimensional Convolution
CNN Convolutional Neural Network
COBYLA Constrained Optimization BY Linear Approximations
DNN Deep Neural Network
FIM Fisher Information Metric
GvM Generalized von Mises
GRU Gated Recurrent Unit
LSTM Long Short- Term Memory
MAE Mean Absolute Error
NEXTGEN NEXT GENeration Air Transportation System
PDFs Probability Density Functions
QC Qualification of Constraints
ReLU Rectified Linear Unit
RMSE Root Mean Square Error
RNN Recurrent Neural Network
SESAR Single European Sky ATM Research
SKL Symmetrized Kullback-Leible
TNC Truncated Newton Conjugate-Gradient
Trust-Constr Trust Region Constraint
xv
xvi
Publications
1 Thi-Lich Nghiem, Viet-Duc Le, Thi-Lan Le, “Quantitative evalua-
tion of robustness of Bayesian CNN models for image classification”,
National Conference on Some selected issues of information and com-
munication technology, Thai Nguyen, Vietnam, 2021, pp. 454-460,
ISBN 978-604-67-1744-7.
xvii
Chapter 1
Introduction
This chapter illustrates a comprehensive exploration of air traffic complexity,
delving into the intricate operational dynamics of air traffic management. It
includes a detailed analysis of the factors that contribute to the heightened
complexity of air traffic, as well as an intensive discussion of the metrics used
to assess this complexity. This discussion is the foundational motivation for
our study, which is presented together with the objectives and scope outlined
in the dissertation. The significance of our contributions is briefly addressed
here, with a more detailed examination to follow in subsequent chapters.
Additionally, we provide an overview of the thesis structure to clarify its
organization and progression.
1.1 Motivation
In the context of air traffic management (ATM), the growing demand for
airspace has placed unprecedented pressure on existing systems. This expo-
nential growth in airspace demands not only highlights the critical need for
enhanced safety measures but also requires efficient optimization of airspace
to accommodate increasing air traffic volumes.
Air traffic management faces numerous complex challenges that impact
operational efficiency and safety. The structure of current airspace configura-
tions, characterized by rigid layouts and fixed sector divisions, significantly
contributes to congestion and inefficiencies. This complexity encompasses
both physical aspects, such as airport layouts and airway structures, and
operational dynamics involving the intricate interactions of aircraft within
these frameworks [68].
1
Human factors further exacerbate these challenges, as air traffic con-
trollers (ATCOs) manage traffic within designated sectors while grappling
with cognitive limitations and the increasing complexity of traffic patterns [36].
The uneven distribution of traffic and the inflexible configuration of airspace
sectors create congestion hotspots, necessitating strategic management strate-
gies to optimize airspace usage and mitigate safety risks [63].
Within the broader context of air traffic complexity, accurate prediction
of aircraft trajectories is crucial. ATM systems must ensure precise trajec-
tory forecasts to maintain safe separation standards and prevent potential
collisions [98]. However, trajectory prediction is affected by inherent uncer-
tainties, particularly influenced by dynamic meteorological conditions such
as wind. These uncertainties arise from incomplete data and model inaccura-
cies in numerical weather prediction, which affect the velocity and precision
of trajectory forecasts [40].
The complexity of trajectory prediction is further compounded by the
nonlinear dynamics of atmospheric interactions, making it challenging to
reliably predict aircraft paths under varying conditions. Existing conflict
resolution methodologies often struggle to adequately account for these un-
certainties, risking violations of safety standards during real-time operations.
Addressing uncertainties in trajectory prediction is essential for enhanc-
ing conflict resolution methodologies and ensuring robust air traffic man-
agement. By developing methods that accurately estimate and incorporate
uncertainty into trajectory forecasts, more resilient and adaptive solutions
can be implemented to maintain safety and efficiency in increasingly con-
gested airspace environments.
The complexities of ATM systems demand innovative methodologies to
effectively address the challenges posed by escalating air traffic congestion.
Recognizing the critical role of accurate air traffic complexity analysis in
enhancing operational efficiency and safety, this thesis endeavors to introduce
novel approaches for analyzing and managing air traffic complexity.
Addressing the uncertainties inherent in ATM is imperative for devel-
oping robust and adaptive solutions that ensure safe and efficient air traffic
operations. By accurately assessing air traffic complexity, it becomes possible
to identify congestion hotspots, optimize airspace utilization, and mitigate
potential safety risks.
2
Through empirical analysis and theoretical inquiry, this study seeks to
provide insights into the complexities of ATM and inform the development
of strategies to address operational challenges in ATM systems. By offering
innovative methodologies for analyzing air traffic complexity, this thesis aims
to contribute to the advancement of ATM practices, ensuring the safety and
efficiency of air travel in increasingly congested airspace environments.
1.2 Background
Air traffic management (ATM) is a critical system that ensures the safety and
efficiency of air travel globally. ATM encompasses a wide range of services
including air traffic control, which manages the movement of aircraft on the
ground and in the air; air traffic flow management (ATFM), which regulates
the flow of aircraft to prevent congestion; and ATM, which involves the
strategic organization of airspace to optimize its use [40]. Together, these
components work to maintain safe separation between aircraft and manage
the complex dynamics of air traffic.
The growth in air traffic has led to increased congestion and complexity
within airspace systems. This is particularly evident in regions with dense
air traffic, such as Europe. The dense network of flight paths in these areas
highlights the challenges faced by modern ATM systems in managing high
volumes of traffic. Such congestion necessitates the implementation of ad-
vanced methodologies to ensure aircraft maintain safe separation distances
and to prevent conflicts.
Figure 1.1 illustrates the one-day traffic density of trajectories across Eu-
rope on two specific years: 2019 (see Figure 1.1a) and 2024 (see Figure 1.1b).
A comparative analysis of these images reveals a significant increase in traf-
fic density over the five-year period from 2019 to 2024, demonstrating the
increased complexity and congestion within European airspace.
Effective trajectory prediction is essential for managing air traffic and
preventing collisions. Predicting the future positions of aircraft allows air
traffic control to anticipate potential conflicts and take preemptive measures.
However, trajectory prediction is inherently complex due to various sources of
uncertainty, particularly those related to meteorological conditions. Winds
and other atmospheric factors can significantly affect aircraft velocity and
3
(a) 2019 (b) 2024
4
Air traffic complexity is a multifaceted concept crucial for ATM, in-
volving various interdependent parameters. Despite extensive research, no
consensus on a single definition exists due to the numerous and highly cor-
related factors involved [89]. These parameters include airspace structure,
weather conditions, and technological factors. Complexity assessment tra-
ditionally relies on the subjective evaluations of air traffic controllers, but
there is a shift towards objective indicators based on various air traffic pa-
rameters [36, 89]. Key indicators include Aircraft Density, Dynamic Den-
sity, Interval Complexity, Fractal Dimension, Input/Output Approach, and
Lapunov Exponents, each with distinct strengths and limitations [68].
Traditional airspace structures are often rigid and lack the flexibility
to adapt to the dynamic demands of modern air traffic. This rigidity can
lead to inefficiencies and potential safety hazards, particularly in high-density
traffic areas. ATCOs manage traffic within these structures, but their ability
to handle large volumes of traffic is limited by cognitive constraints and
workload [68]. This highlights the need for more adaptable and responsive
ATM strategies.
Complexity maps have emerged as valuable tools for visualizing and
managing air traffic complexity. These maps provide detailed insights into
the operational dynamics within airspace, helping to identify areas of high
traffic density and potential conflicts [93]. By analyzing complexity maps,
ATCOs and other stakeholders can make more informed decisions about
aircraft routing and sector management, potentially reducing conflict risks
and improving operational efficiency.
Recent research has focused on developing advanced metrics and method-
ologies for evaluating air traffic complexity. For example, geodesic metrics
and other mathematical models have been used to quantify the level of dis-
order in aircraft trajectories. While various approaches are proposed to deal
with the air traffic complexity, only a few papers manage air traffic and re-
duce the risk of conflicts between aircraft. In 2009, Lee et al. [68] proposed
the strategy for optimizing the entry points of aircraft before reaching the
sector boundary based on a complexity map. They showed that complex-
ity map can provide a visual representation of how a system responds to
disturbances and can be used to identify key drivers of complexity and in-
form decision-making. In the context of ATM, complexity map can provide
valuable insights into the behavior of the system under different traffic con-
5
ditions (such as the entry point angle and heading of the entering aircraft)
and environmental factors (like convective weather), and help identify areas
where improvements can be made. By analyzing the complexity map, the
air traffic control system can identify areas of high traffic density or other
factors that may increase the risk of conflicts between aircraft [93]. In this re-
search, complexity maps are considered as an air traffic evaluation approach
which illustrate how to manage the sector’s conflicts based on the entering
aircraft. Unlike other approaches, Hong et al. proposed a new method called
Variance Quadtree-Based Map Building to manage the conflict by determin-
ing the best strategy when adding an aircraft into a sector. This can help
improve the safety and efficiency of air traffic control operations, as well as
reduce delays and improve overall airspace capacity [57]. In 2020, Juntama et
al. [63] used König metric to build a complexity map to reduce and optimize
the air traffic for strategic 4D trajectory. Delahaye et al. [36] quantify air
traffic complexity of a set of aircraft trajectories by approximating a metric
based on linear dynamical system which can fit a vector field as closely as
possible to the observed aircraft positions and speeds. Therefore, this system
can evaluate the local disorder of a set of trajectories in the neighborhood of
a given aircraft at a given time based on the eigenvalues of the matrix A as-
sociated with organized traffic situations, including the vertical strip around
the imaginary axis. It provides a nuanced understanding of how aircraft
interact within congested airspace.
This background provides a comprehensive overview of the key chal-
lenges and considerations in ATM. The increasing demand for airspace, the
complexities of trajectory prediction, and the rigid structure of traditional
ATM systems underscore the need for innovative approaches. This research
aims to address the critical gaps in trajectory prediction, heading angle dis-
tribution reconstruction, and air traffic complexity management. By tack-
ling these issues, the primary objective of this research is to introduce novel
methodologies aimed at enhancing the understanding and management of
air traffic complexity in ATM systems. Specifically, the research seeks to
address the following objectives:
6
• Propose a new metric to build and evaluate air traffic complexity map,
thereby improving safety and efficiency in ATM.
7
of aircraft travelling through a specific sector over a given time period is
referred to as air traffic density or congestion. Higher air traffic congestion
can make possible air traffic accidents, so that more stringent monitoring
becomes necessary. The increase in complexity might have a negative affect
on the controller’s decision-making ability, resulting in more errors. Thus,
so-called hot-spots of air traffic congestion necessitate controllers to deter-
mine whether conflict avoidance or modification is required for any planned
trajectories. The measuring of air traffic complexity is a significant issue in
this regard.
In recent years, there has been a growing interest in developing meth-
ods to assess the complexity of air traffic. This has led to a variety of ap-
proaches to evaluate air traffic complexity which can be broadly categorized
into three major groups, including geometry-based approaches, agent-based
models, and machine learning techniques listed in Figure 1.2.
Figure 1.2: Some proposed approaches to evaluate the air traffic complexity.
8
[119]. While these metrics have proven effective in simple scenarios, they are
unsuitable for complex or large-scale applications. The approaches also fail
to account for spatial-temporal data [124]. The current work attempted to
overcome these challenges by implementing an air traffic complexity metric
based on linear dynamical systems. Numerous studies have examined traffic
structures and airspace geometries to demonstrate the efficacy of this ap-
proach to estimate metric for local disorder and interact the trajectory sets
in ATM. Its applicability derives from its efficiency, relative simplicity, and
mathematically predictable behavior. For example, topological entropy can
evaluate the air traffic complexity by capturing the connectivity and struc-
ture of the airspace [129, 39, 32]. Or Shannon entropy and Kullback-Leibler
divergence were proposed to capture the complexity of air traffic in both
spatial and temporal dimensions, and can be used to evaluate complexity at
different levels of air traffic control [132, 65].
Agent-based models use a simulation-based approach to model the be-
havior of individual aircraft and the interaction between them. This approach
considers various factors, such as weather, airspace structure, and traffic de-
mand, to assess the complexity of air traffic. Agent-based models have been
used to analyze ATFM and air traffic control procedures. Dynamic den-
sity [11, 67] only listed a few factors for dynamic density, and each one is
assigned a subjective weight. Aspects such as the quantity of aircraft, the
number of aircraft with heading changes greater than 15 degrees or speed
changes greater than 10 knots, the sector size, and others are taken into ac-
count. Some papers are related to the application of agent-based simulation
in various aspects of air traffic management such as simulating aircraft taxi-
ing operations [74], assessing unmanned aircraft systems integration in the
national airspace system [27, 127], evaluating and simulating air traffic flow
management strategies by [126, 128, 44, 46].
Machine learning techniques have become increasingly popular in the
field of air traffic complexity evaluation due to their ability to process large
amounts of data and extract patterns and relationships that may not be easily
identifiable through traditional analytical methods. These approaches have
been used in air traffic management to predict traffic flows, identify conflicts,
and optimize air traffic control procedures. For example, some researches de-
veloped machine learning techniques to evaluate air traffic complexity using
flight delay, cancellation, rerouting, and rescheduling [137], or using traffic
volume, density, and complexity level [69]. Other researches used machine
9
learning approach to predict and manage air traffic complexity by consid-
ering air traffic volume, flow, weather conditions, and airport capacity [21,
4, 28, 121, 137]. In 2023, Alharbi et al. (2023) [5] used CNN and LSTM
to predict the air traffic flow as well as compute the air traffic complexity.
The metrics used in the studies varied, including flight delay, cancellation,
rerouting, rescheduling, traffic volume, density, complexity level, air traffic
flow, weather conditions, and airport capacity. The papers highlighted the
potential of machine learning in accurately predicting and managing air traf-
fic complexity. However, limited data availability and potential biases in the
data were identified as potential limitations in some studies.
Hybrid approaches for air traffic complexity evaluation typically com-
bine two or more different techniques, such as geometric metrics, agent-based
models, or machine learning, in order to obtain a more comprehensive and
accurate understanding of the complexity of the airspace system. These ap-
proaches have become increasingly popular in recent years, as they allow for
a more nuanced evaluation of air traffic complexity that takes into account
both the geometric and dynamic aspects of the system. One notable ex-
ample is the hybrid approach proposed by Marwala and Nelwamondo, which
combines machine learning and agent-based methods for air traffic flow man-
agement. This method considers metrics such as delay, capacity, and safety,
providing a holistic view of air traffic complexity. However, it requires a
significant amount of data to train the machine learning models effectively,
highlighting the importance of robust data collection and management in air
traffic complexity assessment [81]. Another significant contribution is the
work of Arribas et al., they integrated geometric metrics (such as distance,
density, angle, and altitude difference) with machine learning techniques.
Their approach was validated using real air traffic data, demonstrating its
effectiveness in capturing the multifaceted nature of air traffic complexity.
By leveraging both geometric and data-driven insights, this hybrid method
enhances the accuracy and reliability of complexity evaluations, providing
valuable information for air traffic controllers and management systems [9].
Incorporating hybrid approaches into air traffic complexity assessment
offers several advantages. They enable the integration of multiple data
sources and methodologies, resulting in a more detailed and robust analysis.
This, in turn, supports the development of more adaptive and responsive air
traffic management strategies, crucial for handling the increasing demands
of modern airspace operations.
10
1.3.1 Factors contributing to air traffic complexity
Air traffic complexity is influenced by various factors that affect the manage-
ment and coordination of aircraft movements. Understanding these factors
is essential for improving the efficiency and safety of air traffic operations.
The design and organization of air routes are critical in determining air
traffic complexity. Air routes are predetermined paths that aircraft follow,
and their configuration can significantly influence traffic flow, congestion,
and the potential for conflicts. Efficiently designed air routes help manage
complexity by reducing bottlenecks and distributing traffic more evenly. Ad-
ditionally, airspace is divided into control sectors, each managed by a specific
air traffic control unit. The size, shape, and traffic volume of these sectors
greatly impact complexity. Smaller sectors with high traffic volumes or com-
plex geometries can increase the difficulty for controllers to manage aircraft
movements safely and efficiently.
Traffic density refers to the number of aircraft within a specific volume
of airspace at a given time. High-density areas, such as those near major
airports or busy air corridors, significantly increase complexity. Managing
high-density traffic requires precise coordination and efficient use of available
airspace to minimize conflicts and delays. Controllers must be adept at
handling a large number of aircraft simultaneously, ensuring safe separation
and smooth flow.
Different aircraft types have varying performance characteristics, includ-
ing speed, maneuverability, and size. These differences contribute to air
traffic complexity as controllers must consider the performance limitations
and capabilities of each aircraft when planning and coordinating movements.
Mixed traffic, involving both fast and slow aircraft, adds to the challenge,
requiring careful sequencing and spacing to maintain safety and efficiency.
Weather conditions, such as storms, turbulence, and low visibility, have
a substantial impact on air traffic complexity. Adverse weather can force
changes in flight paths, increase separation requirements, and lead to delays.
Seasonal variations also play a role, with certain times of the year posing more
significant weather-related challenges than others. Controllers and pilots
must adapt quickly to changing conditions to maintain safe and efficient
operations.
11
Human factors, including the workload, decision-making processes, and
communication between air traffic controllers and pilots, are critical ele-
ments of air traffic complexity. High complexity can lead to increased stress
and fatigue among controllers, potentially impacting their performance and
decision-making abilities. Effective training, communication protocols, and
support systems are essential for managing these human factors. Ensuring
that controllers are well-prepared and supported can mitigate the impact of
high complexity on human performance.
Technological advancements, such as Automatic Dependent Surveillance-
Broadcast (ADS-B), Next Generation Air Transportation System (NextGen),
and the Single European Sky ATM Research (SESAR) initiative, profoundly
impact managing air traffic complexity. These technologies enhance situa-
tional awareness, improve communication, and provide more accurate and
timely information, helping to reduce complexity and improve efficiency. By
leveraging these advancements, air traffic management can become more
proactive and adaptive, ensuring smoother and safer operations.
Given these factors, accurately determining the complexity of airspace
is crucial for effective air traffic management. Understanding complexity
helps in resource allocation, workload distribution, and strategic planning,
ensuring that both safety and efficiency are maintained. Among the factors
discussed, traffic density is often the most influential, as it directly impacts
the number of aircraft that must be managed simultaneously, thus driving
the need for precise coordination and effective use of airspace. Identifying
and addressing areas of high complexity allows for targeted improvements
and innovations in air traffic control strategies.
In a control sector, the control workload increases non-linearly with the num-
ber of aircraft. There exists a threshold beyond which controllers can no
longer accept additional aircraft, necessitating rerouting through less con-
gested neighboring sectors. This state, known as sector saturation, must be
avoided as it leads to cumulative overloading in preceding sectors, potentially
backing up to departure airports. Estimating the saturation threshold is com-
plex and depends on factors such as route geometry, sector layout, aircraft
distribution, and controller performance. A widely accepted threshold is 3
12
conflicts and 15 aircraft per sector, with a maximum duration of 10 minutes
to avoid excessive controller stress, ensuring optimal safety conditions.
Control workload measurement is critical in many areas of ATM, as
it underpins optimization processes. Applications include airspace com-
parison (United States-US/Europe), validation of future concepts SESAR,
NEXTGEN, analysis of traffic control action modes, sectorization optimiza-
tion, and traffic assignment optimization. Additionally, it is essential for
determining congestion pricing zones, developing organic control assistance
tools, generating 4D trajectories, and predicting congested zones.
Currently, the operational capacity of a control sector is measured by
the maximum number of aircraft traversing the sector within a given time
frame. This metric does not account for traffic orientation, treating struc-
tured and disordered traffic equally. Thus, controllers may accept traffic
exceeding operational capacity in structured scenarios while rejecting traffic
in disordered scenarios even if capacity is not reached. Therefore, measuring
the number of aircraft per unit time is an insufficient metric for representing
the difficulty of a traffic situation.
Ideally, a metric should precisely measure the mental effort required to
manage a set of aircraft. While challenging, complexity metrics extending
beyond simple aircraft counts are possible. Two essential notions are:
13
in conflict prediction and high trajectory interdependency. The mid-
dle situation, though presenting significant conflict risk, is manage-
able by giving a uniform direction to all aircraft to place them in safe
roundabout trajectories. The left situation, with its non-challenging
trajectories and stable relative distances, presents minimal immediate
difficulties.
14
• Flow-based metrics.
• Geometry-based metrics.
• Dynamic-based metrics.
Flow-based metrics
The flow-based metrics measures complexity based on traffic flow and den-
sity. It focuses on the number of aircraft passing through a specific area over
a period and the interactions between these aircraft. Traffic density metrics
measure the number of aircraft within a defined airspace volume, while the
traffic flow rate assesses the rate at which aircraft enter and exit a specific
sector or route. Flow-based metrics are particularly useful in high-density
airspaces, such as those around major airports. Monitoring traffic density
can help identify potential bottlenecks and enable proactive management to
prevent congestion. Flow-based metrics are straightforward and easy to in-
terpret, making them useful for quick assessments. However, they may not
capture the full complexity of interactions between aircraft or account for
dynamic changes in traffic patterns.
Geometry-based metrics
15
Dynamic-based metrics
1.4 Contributions
16
Moreover, assessing spatial traffic complexity is essential for optimizing
ATM strategies. Current methodologies lack a systematic approach to quan-
tify the intricate spatial relationships within airspace, hindering efforts to
improve overall airspace efficiency and safety.
The four key contributions of the thesis are:
17
as explained in the next point. The resulting probability distributions
are to be exploited to produce complexity maps, as explained in the
next point.
18
orem, our model provides a probabilistic trajectory forecasts and associated
confidence levels while coordinate transformation enhances spatial awareness
and predictive capabilities. We test on the flight data collected from the
Hartsfield–Jackson Atlanta International Airport (ATL). Our experimental
results demonstrates the effectiveness of the proposed BayesCoordLSTM-
based approach in improving flight trajectory prediction accuracy, with a fo-
cus on Mean Absolute Error (MAE) and Root Mean Square Error (RMSE)
values. The integration of the Bayesian theorem and Coordinate transfor-
mation into ConvLSTM models represents a substantial advancement in the
field of flight trajectory prediction.
An important issue in air traffic management is the construction of traf-
fic complexity maps. Observed or simulated data provide a statistical sample
of trajectory angles. The underlying angular density is a crucial component
in assessing local complexity. In Chapter 3, we propose a Maximum Entropy
method for reconstructing the angular densities. These densities are con-
strained by the estimation of there Fourier coefficients. Accounting for the
statistical properties of moment estimation is achieved by using the square of
the Mahalanobis distance between statistical data and Fourier coefficients of
the sought density as the data fitting term. Partially finite convex program-
ming is then implemented to compute the optimal density. The cornerstone
of this approach, enabling a rigorous treatment of variational aspects of the
chosen method, is an infinite-dimensional version of Fenchel’s duality theo-
rem, coupled with results on the conjugacy of integral functionals. We also
discuss the numerical aspects of our approach and present some simulations
and reconstructions from simulated data.
In order to define and compute the air traffic complexity map, we use
information geometry. Chapter 4 focuses on the methodology for building
air traffic complexity maps by using the geodesic distance between the Max-
imum Entropy densities obtained in the previous chapter and the uniform
distribution, which corresponds for a given density of traffic to the maximal
complexity. We also test on real flight data, demonstrates robustness, with
alternative criteria possible by comparing heading distributions at grid cell
boundaries. Our strategy for building complexity maps is both numerically
efficient and effective in capturing air traffic complexity.
In Chapter 5, we develop a Maximum Entropy approach to the eval-
uation of spatial density. This methodology extends upon previous contri-
19
butions in the field, particularly those related to the modeling of species
distribution. However, it introduces significant refinements by avoiding the
discretization of the sample space, thus allowing for a more continuous and
precise representation of spatial density. This continuous approach provides
a more accurate framework for analyzing complex spatial patterns, which is
particularly valuable in various scientific and engineering applications. While
this chapter focuses primarily on the theoretical development and demonstra-
tion of the methodology, its potential use in ATM, particularly for optimizing
airspace utilization and enhancing safety, will be explored and expanded upon
in future work.
Finally, Chapter 6 provides a summary of the thesis, gives conclusions
and potential research directions based on the presented results and method-
ologies that were explored.
20
Chapter 2
2.1 Introduction
21
work weights, biases, and model predictions, are single point-estimates. This
determinism necessitates a large dataset for training, which makes convolu-
tional neural networks (CNNs) vulnerable to overfitting on small datasets.
Consequently, DNNs may perform well on training data but fail to general-
ize to new data, leading to overconfidence or uncertainty in their predictions.
This issue can be addressed using the Bayesian method.
The Bayesian method has become crucial for enhancing DNNs. In a Ba-
yesian model, the posterior distribution provides comprehensive information
about unknown parameters given the data. Various techniques, including
Markov Chain Monte Carlo (MCMC), Laplace approximation, expectation
propagation, and variational inference, have been employed to approximate
the posterior distribution. These methods make Bayesian models robust
to overfitting, capable of estimating uncertainty, and effective at learning
from smaller datasets. Variational inference, in particular, minimizes the
Kullback-Leibler divergence between a proposed approximate posterior and
the true unknown posterior distribution of the weights, addressing the in-
tractability of the exact Bayesian method.
In air traffic management (ATM), the increasing airspace demand puts
pressure on current systems to ensure aircraft maintain safe separation stan-
dards to avoid collisions [40]. This necessitates accurate predictions of air-
plane orbits in the near future [98]. However, this is challenging due to the un-
certainty inherent in predicting aircraft trajectories. Therefore, understand-
ing and managing this uncertainty is crucial in ATM, where meteorological
factors, especially winds, represent significant sources of uncertainty. Uncer-
tainty in weather forecasts arises from inadequate or inaccurate knowledge
of the atmosphere’s state at the time of prediction and the uncertainties of
the model [40]. These uncertainties propagate through nonlinear and chaotic
atmospheric dynamics, making simple statistical characterization inadequate
for depicting prediction uncertainty. Consequently, these factors can affect
aircraft velocity and the accuracy of trajectory predictions. Most methods
for flight conflict resolution do not account for this uncertainty. As a result,
they produce solutions like collision-free flight routes that are vulnerable to
disturbances and increase the risk of violating separation standards.
As air transport data volumes increase, novel analysis techniques and
operational strategies are crucial in ATM for more precise trajectory predic-
tion models. However, uncertainties, including weather and unknown vari-
22
ables from un-modeled actors, remain significant challenges in predicting
flight trajectories [88]. Researchers have explored many approaches, from
physics-based methods relying on flight dynamics to data-driven methods
using machine learning algorithms to analyze historical data [70]. As de-
mands on ATM grow and the need for more accurate predictions intensifies,
there is a rising interest in data-driven approaches to overcome the limita-
tions of traditional methods. One significant challenge in predicting aircraft
trajectories is the uncertainty arising from factors such as changing weather
conditions. Many researchers utilize the Bayesian method to quantify uncer-
tainty in predicting time-series data, including flight trajectory prediction in
ATM.
Despite significant advancements in trajectory prediction, data-driven
approaches remain susceptible to overfitting and often struggle to generalize
well to new and unseen flight patterns. Furthermore, information about
an aircraft position, altitude, and heading is typically processed relative to
a reference point, making coordinate transformation essential for accurately
understanding and predicting aircraft trajectories. In this chapter, we aim to
improve flight trajectory prediction accuracy by integrating Bayesian theory
and coordinate transformation in a proposed model called BayesCoordLSTM.
The primary contributions of this study can be summarized as follows:
23
• Empirical validation. We test our model on a real dataset and
compare its prediction performance with three models, including 3D-
CNN [95], CNN-Gated Recurrent Unit (GRU) [106], and a combination
of CNN-GRU with 3D Convolution (CG3D) [107]. The analysis high-
lights the superior accuracy and robustness of our proposed model in
flight trajectory prediction, demonstrating its practical utility in ATM.
The remainder of this chapter follows this structure: Section 2.2 presents
a brief literature review on flight trajectory prediction. Some basic informa-
tion of LSTM are presented in Section 2.3. Section 2.4 illustrates our model
and the experimental results of a real flight dataset are presented in Sec-
tion 2.5. Finally, Section 2.6 summarizes the conclusions and outlines future
works in this field.
24
Figure 2.1: Different stages of a flight [135].
25
diction accuracy while reducing the size of the prediction interval. This
type of prediction facilitates immediate conflict risk detection, allowing
for effective conflict resolution. Additionally, accurate trajectory pre-
dictions can support the generation of viable alternative trajectories,
accommodating existing constraints.
26
Therefore, in recent years, data-driven approaches have gained popu-
larity, leveraging historical flight data and other relevant variables to learn
patterns and relationships. These patterns are used to create a data-driven
model, which is trained to represent the collective behavior of the trajec-
tories based on input features (e.g., past positions, velocities) and future
trajectory predictions using machine learning models. According to [35], al-
though machine learning is one of the most researched topics in computer
science, it has not yet been widely adopted by end-users in ATM domain.
This research demonstrates that the number of papers on machine learning
in ATM has significantly increased in the recent years, especially in predict-
ing and optimizing hotspots to avoid collisions and managing traffic flows
- one of the most consistent subjects of machine learning in ATM. Using
the machine learning model, it can extract the common traits by learning
from extensive historical trajectory data. These techniques can also incor-
porate additional inputs such as aircraft performance, air route structures,
and environmental factors (data-to-data). Machine learning models consis-
tently provide accurate predictions and benefit from a wide range of trainable
parameters. Supervised learning, in particular, is a well-known approach
used by many researchers for trajectory prediction. By learning from large
datasets of historical trajectory data, these models can generalize and predict
future trajectories effectively. Furthermore, these methods can utilize flight
performance metrics, flight path structures, and environmental parameters
as additional inputs for enhanced trajectory prediction.
Machine learning algorithms consistently provide accurate predictions
and profit from a wide range of trainable parameters [34]. These approaches
often use filter-based methods [94, 130] or deep learning-based methods [107]
to extract the properties of data changes through data analysis to forecast
the trajectory. Recently, significant advancements have been made in flight
trajectory prediction through the use of advanced machine learning tech-
niques, particularly CNN and Recurrent Neural Network (RNN). Among the
various types of RNNs, LSTM networks have proven particularly effective
for handling sequences and time series data. In 2021, Shi et al. proposed a
constrained LSTM model specifically for predicting flight trajectories [111].
In their research, they defined three types of constraints: climbing, cruis-
ing, and descending/approaching phases. Their constrained LSTM model
incorporates these phase-specific constraints to account for the dynamic na-
ture of flight, ensuring both long-term dependencies and realistic physical
27
constraints. Compared to a standard LSTM, this model significantly en-
hances prediction accuracy by maintaining trajectory continuity and effec-
tively managing sparse way points in flight data. Other methods applied the
Bayesian method in machine learning models: Graph Convolutional Network
to forecast the traffic flow in the next 15, 30, 45, and 60 minutes (SZtaxi
and Los-loop) [45], a deep Gaussian process [29], LSTM and the LSTM’s
variants, including gated LSTM [31], convolutional LSTM [110, 87, 108],
CNN-LSTM [73, 76], deep LSTM [87, 131, 30, 1], and sequence-to-sequence
LSTM [114]. CNNs, RNNs, and LSTMs were used to extract spatial fea-
tures from local regions and long-range dependencies in images, respectively.
These studies provide valuable references for the model architecture used in
this work.
LSTM networks are designed to handle sequential data and capture
temporal dependencies effectively, making them well-suited for tasks involv-
ing time series data. However, these models are not inherently equipped
to capture spatial features unless they are modified or combined with other
methods. In applications such as flight trajectory prediction, where both
temporal and spatial features are crucial, a standard LSTM may fall short
in adequately addressing spatial aspects. This is why additional constraints
or hybrid models that incorporate spatial information are often necessary.
In this context, CNN is more suitable for extracting spatial features. Hence,
a hybrid CNN and LSTM [76, 71] or combined CNN and GRU [107, 116]
approaches have been widely used in classification and prediction tasks. For
example, Shafienya et al. (2022) [106] compared various CNN-GRU models
for predicting 4D flight trajectories using ADS-B data. CNNs extract spa-
tial features from flight data, while GRUs capture temporal dependencies.
Combining CNNs with GRUs enhances prediction accuracy by effectively
capturing both spatial and temporal patterns. They subsequently extended
their research by combining a CNN-GRU with a 3D-CNN, which effectively
extracts spatial-temporal features for prediction [107]. However, a potential
drawback of their models is their reliance on large datasets for training and
evaluation, as well as the complexity involved in parameter tuning for the
combined CNN-GRU and 3D-CNN architectures.
Additionally, these approaches often learn to predict only specific types
of flight trajectories. To handle different trajectory types or routes, re-
searchers must retrain the model or adjust its parameters. Although ma-
chine learning has achieved significant advancements in trajectory prediction,
28
it still has limitations, such as learning only specific types of trajectories.
Therefore, if the model needs to predict different types of flights or routes,
it must be retrained by adding or adjusting hyperparameters. This process
increases both the cost and the time required for retraining.
Moreover, in flight trajectory prediction, uncertainty can arise from fac-
tors such as weather conditions or unpredictable aircraft behaviors, as well
as from the data itself or the optimal values of model parameters. Failure to
capture these uncertainties can result in models that do not provide prob-
abilistic predictions or associated confidence levels, ultimately diminishing
prediction accuracy. This can have significant implications for safety and
performance in the aviation industry.
To address this problem, the Bayesian framework offers several advan-
tages: it accommodates various prior ranges, avoids costly computations,
integrates well with LSTM models for continuous-time responses, and effec-
tively handles large datasets. Many researchers have applied the Bayesian
method to various models, including Graph Transformers [88] and CNN-GRU
architectures [59]. This approach not only estimates model uncertainties but
also optimizes predictions to enhance performance [10, 2].
In a 3D space, as aircraft move, information about their position, alti-
tude, and heading is processed relative to a reference point. Without coordi-
nate transformation, the model may not effectively capture the real spatial
understanding or accurately depict the complex variations in flight trajecto-
ries [118]. This limitation diminishes the accuracy and reliability of aircraft
position predictions. Therefore, integrating the Bayesian method with coor-
dinate transformation into these models enhances their capability to handle
flight trajectory predictions. This combination improves the model’s ability
to manage uncertainty, benefiting ATM by enhancing safety and efficiency,
and optimizing model parameters for better and more stable performance.
By learning from historical data, data-driven approaches can adapt to
changing conditions and improve the accuracy and reliability of flight path es-
timation. However, challenges such as generalizing from sparse data, dealing
with unpredictable weather conditions, navigating dynamic airspace struc-
tures, and achieving real-time prediction remain. To address these challenges,
integrating physics-based knowledge with data-driven techniques presents a
promising solution. This hybrid approach leverages the strengths of both
methods, offering enhanced prediction accuracy and reliability in ATM [70].
29
2.3 Long Short-Term Memory
• Forget gate f : decides which information from the previous cell state
should be discarded.
• Output gate o: controls which part of the cell state should be output
as the current hidden state.
30
These gates consist of sigmoid layers, with output values between 0 and 1.
These values decide how much information to retain or discard, enabling
LSTMs to better control the flow of information over time.
The new cell state Ct is updated by combining the filtered previous state
and the new candidate values. This update is performed as follows:
31
where ft helps in discarding or retaining the old information while it and C̃t
add new information to the cell state.
Finally, the LSTM decides what to output as the hidden state at the
current time step through the output gate. The output gate uses a sigmoid
layer to filter the cell state and determine which parts of it should be used
to compute the hidden state:
32
and process intricate spatial patterns in flight trajectories. The second com-
ponent is a Bayesian LSTM network, designed to capture long-term temporal
dependencies. By incorporating the Bayesian method, the LSTM network
enhances its ability to manage uncertainties and provide more robust predic-
tions over extended time sequences. Together, these components enable the
BayesCoordLSTM to deliver superior performance in flight trajectory pre-
diction by integrating advanced spatial and temporal data processing with
probabilistic reasoning.
Output
FC
Layer
Latitude
Longtitude Bayesian LSTM t+2M
Altitude t+M+1
Flight LSTM LSTM LSTM LSTM
Normalization Predicted
trajectory Coordinate Trajectories
Transformation
R
θ
h CNN
t+M
t
4 -D trajectories
In the context of this research, the inputs and outputs of our model for
flight trajectory prediction are defined as follows:
trajectory i at time t.
– Each trajectory point Pit is represented as
Pit = (longit , latti , altti , vit , θit ),
33
where each point has five components, corresponding to longi-
tude, latitude, altitude, speed, and heading of the flight at time t,
respectively.
• Output: Pit+M +1 , Pit+M +2 , . . . , Pit+2M represents the predicted flight
trajectories.
For example, assume that we use 100 points to predict five points in
the future, so that M = 100. For flight i, starting at time t = 1, we have
Pit = (Pi1 , Pi2 , . . . , Pi100 ). The output then becomes (Pi101 , Pi102 , . . . , Pi105 ),
corresponding to five points in the future.
∥xfp − xap ∥
R= , (2.6)
Dmax
!
longfp − longap
θ = arcsin , (2.7)
∥xfp − xap ∥
|altfp − altap |
h= , (2.8)
Amax
where:
34
• Amax corresponds to the highest achievable altitude within the predic-
tive domain;
In this thesis, we choose Dmax = 328084 feet and Amax = 39370 feet to
predict aircraft trajectories, balancing computational efficiency and practi-
cality. These values provide sufficient coverage while conserving computa-
tional resources and align with typical aircraft operating ranges and altitudes,
ensuring relevance to real scenarios.
3. Output: The output from the Conv1D layer consists of 32 feature maps,
each representing a learned pattern from the input data. This enriched
representation retains critical information for subsequent layers in the
model.
35
2.4.3 Bayesian optimization
Bayesian optimization is a powerful technique for optimizing complex, high-
dimensional functions with costly evaluations, such as the hyperparameters
of deep learning models. Incorporating the Bayesian methods into LSTM
models, as proposed by several researchers [82, 99, 139, 88, 59], involves cre-
ating a probabilistic model to capture the uncertainty in the function being
optimized. This approach is advantageous because it enables a more efficient
and informed optimization process, leveraging prior knowledge to minimize
the number of evaluations required. The Bayesian framework’s flexibility
with various prior ranges, its avoidance of expensive computations, and its
compatibility with LSTM models for continuous-time responses make it well-
suited for handling large datasets. By utilizing Bayes’ theorem, the hyperpa-
rameters of the LSTM model can be tuned to specific characteristics of the
data, enhancing model performance and providing insights into uncertainty.
In the context of Bayesian LSTM modeling, where uncertainty is critical
for parameter estimation, we use a general formula for sampling weights W
and biases b at the tth time step of the nth layer. These are given by Equa-
tion (2.9) and Equation (2.10), respectively:
(t) (t) (t)
W(n) = N (0, 1) ∗ log 1 + ρ(w) + µ(w) , (2.9)
and
(t) (t) (t)
b(n) = N (0, 1) ∗ log 1 + ρ(b) + µ(b) , (2.10)
where N (0, 1) represents sampling from a standard Gaussian distribution,
introducing randomness in weight and bias initialization, and ρ and µ are
discussed hereafter. This randomness is vital for effectively exploring the
solution space during training.
The term log(1 + ρ) transforms the estimated standard deviation ρ of
the input feature into a non-negative space, ensuring the non-negativity of
sampled weights and biases, which is essential for numerical stability during
both training and inference.
Moreover, incorporating the mean value µ derived from the training
data into the sampling process enables the model to utilize prior knowledge
about the distribution of weights and biases. This enhances parameter esti-
mation efficiency and improves the model’s performance to capture complex
temporal dependencies.
36
In this phase, Bayesian optimization leverages the Bayesian methods
to compute the posterior distribution of the objective function, assisting in
the selection of hyperparameters for LSTM models. It iteratively refines
the model based on past evaluations to optimize performance. If the re-
sults are unsatisfactory, the hyperparameters and architecture are reassessed;
otherwise, successful outcomes guide future predictions using the optimized
weights. In our proposed model, Bayesian optimization plays a crucial role
in identifying key LSTM hyperparameter values (see Figure 2.4 and Algo-
rithm 1).
Hyperparameters &
Training set architecture
Bad
Bayesian optimization
Predictability
Evaluate model
37
Algorithm 1 Bayesian Optimization for LSTM Hyperparameters
1: Input: Set of flight trajectories X , Total number of iterations N , Length
of trajectory points M , Objective function f , Acquisition function A.
2: Output: Θ (Set of observed pairs of hyperparameters and objective
function values).
3: Initialize Θ ← ∅; n ← 0
4: Define X = {Xi }
5: while n ≤ N do
6: x∗ ← argmaxx A(x | Θ)
7: y ← f (x∗ )
8: Θ ← Θ ∪ {(x∗ , y)}
9: Train the LSTM model using the updated Θ
10: Compute the weights W and biases b based on Equation (2.9) and
Equation (2.10).
11: n←n+1
12: end while
13: Return Θ
The selection function is the criteria by which the next set of hyperpa-
rameters are chosen from the surrogate function. The most common choice
of criteria is Expected Improvement (EI) [62]:
Z ∞
EI(x) = max(y − y ∗ , 0)p(y | x) dy, (2.11)
−∞
where:
38
2.5 Experimental results
Dataset
39
the same aircraft, and a collection of these records together forms the tra-
jectory Xi . In this research, we utilize the complete set of trajectory data
sourced from geographical coordinates associated with ATL airport. Each
trajectory is recorded at one-second intervals, providing detailed track points
for analysis.
Feature Value
Timestamp 1537009474
Flight ID e8044e
Longitude -84.402633978396
Latitude 33.7542572021484
Altitude 12275.82
Speed 260.53122347432
Heading 12.1974800580645
Hour 1537009200
The dataset includes both static and dynamic information, with the
focus on dynamic data such as heading, speed, and 4D data (timestamp,
longitude, latitude, and altitude). Updates to this data occur every 5 seconds.
To address the continuity and smoothness of flight trajectories [76], missing
points are filled using the cubic spline interpolation, which offers a smooth
and accurate method for filling in gaps in the dataset (see Figure 2.5). To
manage the spatial range of the input data and enhance forecast accuracy
and smoothness, we employ a sliding window approach with a window size
of 100 seconds and a step size of 1 second. This method segments the data
into 105-second intervals, where the first 100 seconds are used as historical
data and the last 5 seconds serve as the prediction window.
Given a series of trajectory points P1 , P2 , P3 , . . . , Pn , with a time window
size of 100, the trajectory at time t can be expressed as
Xit = {Pit−99 , Pit−98 , Pit−97 , . . . , Pit }.
Each trajectory segment contains 100 consecutive trajectory points. The
sliding window is moved forward continuously to perform the segmentation
operation until the last trajectory point is reached. The segmented trajectory
is denoted by
X = {Xit , Xit+1 , Xit+2 , . . . , Xin },
40
(a) Before interpolation (b) After interpolation
Figure 2.5: An example of addressing missing data before (left) and after
(right) using cubic spline interpolation.
This results in 21258 trajectories, which are then divided into training
and test sets, including 17006 and 4252 samples in training and testing sets,
respectively.
41
ground truth values using metrics like RMSE and MAE to quantify
accuracy, helping gauge the model’s ability to generalize and providing
insights into overall performance.
Normalization
X − min(X)
Xnew = , (2.12)
max(X) − min(X)
where:
42
Table 2.3: Basic parameters of proposed model.
43
advancements and performance improvements of the BayesCoordLSTM
relative to these established alternatives.
44
of outliers. Furthermore, the incorporation of the Bayesian method helped
prevent overfitting by using probability distribution during training, thereby
improving the model’s generalization ability. This combination of techniques
enabled our proposed model to achieve more accurate predictions. As seen
in Table 2.4, our method achieved an RMSE of 0.2154 and MAE of 0.1624,
demonstrating its superior predictive accuracy compared to the other models.
Although the training time of the BayesCoordLSTM, at 9450 seconds,
is slightly longer than that of CG3D, at 9042 seconds, this marginal in-
crease is outweighed by the significant improvements in both RMSE and
MAE. Therefore, the BayesCoordLSTM not only achieves superior predic-
tive performance but also maintains competitive computational efficiency,
underscoring its overall effectiveness in the comparative analysis.
We then explore the individual and combined impact of integrating the Ba-
yesian method and coordinate transformation within the BayesCoordLSTM
framework. Through a series of controlled experiments, we evaluate three
configurations of the CNN-LSTM model: the baseline CNN-LSTM, CNN-
LSTM with coordinate transformation, and CNN-LSTM with both the Baye-
sian method and coordinate transformation. This analysis provides insights
into how each enhancement contributes to improving the accuracy and reli-
ability of flight trajectory predictions.
The prediction comparisons for a single aircraft, as shown Figure 2.7a,
Figure 2.7b, and Figure 2.7c, further illustrate these improvements. The
results demonstrate that the BayesCoord(CNN-LSTM) model consistently
produces the most accurate trajectory predictions, closely aligning with the
actual trajectory (see Figure 2.7d).
In Figure 2.7, the predicted and actual trajectories in 3D space are de-
picted. It can be observed that the predicted trajectories of the three models
exhibit similar trends to the actual trajectories, but notably, the CNN-LSTM
model’s predicted curve deviates significantly from the other two models.
The use of coordinate transformation in the Coord(CNN-LSTM) and the
proposed models helps reduce errors in predicting latitude, longitude, and
especially altitude. As shown in Figure 2.7d, the predicted trajectory by
the proposed model closely matches the actual trajectory with the small-
45
(a) CNN-LSTM. (b) Coord(CNN-LSTM).
46
est prediction error, followed by the Coord(CNN-LSTM) and CNN-LSTM
models.
Based on these results, the evaluation and comparison of the CNN-
LSTM, Coord(CNN-LSTM), and BayesCoord(CNN-LSTM) models in pre-
dicting trajectories in 3D space reveal significant differences in prediction ac-
curacy. The Coord(CNN-LSTM) model shows improvement in latitude and
longitude prediction compared to the baseline CNN-LSTM, highlighting the
effectiveness of coordinate transformation. Moreover, the BayesCoord(CNN-
LSTM) model achieves the best performance, producing trajectories closest
to reality with the smallest prediction errors among the evaluated models.
2.6 Conclusions
47
enhances predictive performance. While our study provides a solid founda-
tion for predicting flight trajectories, future research could explore dynamic
hyperparameter adaptation techniques. These techniques could involve real-
time adjustments to hyperparameters in response to evolving flight scenarios
or changing weather conditions, further enhancing the BayesCoordLSTM
model’s adaptability and real-time predictive capabilities.
48
Chapter 3
Reconstructing angular
probability density by using
Maximum Entropy
An important issue in air traffic management is the construction of traffic
complexity maps. Observed or simulated data provide a statistical sample
of trajectory angles. The underlying angular density is a crucial component
in assessing local complexity. Our approach involves applying the Maximum
Entropy principle, constrained by the statistical estimation of Fourier coef-
ficients for the desired density. Accounting for the statistical properties of
moment estimation is achieved by using the square of the Mahalanobis dis-
tance between statistical data and Fourier coefficients of the sought density
as the data fitting term. Partially finite convex programming is then imple-
mented to compute the optimal density. The cornerstone of this approach,
enabling a rigorous treatment of variational aspects of the chosen method, is
an infinite-dimensional version of Fenchel’s duality theorem, coupled with re-
sults on the conjugacy of integral functionals. We also discuss the numerical
aspects of our approach and present some simulations and reconstructions
from simulated data.
3.1 Introduction
49
relevant, particularly in complex systems such as those found in ATM [41,
47]. This chapter addresses this challenge by proposing a framework for re-
constructing angular probability density functions through the lens of the
Maximum Entropy principle.
The driving force behind our endeavor stems from the pressing need to
estimate the complexity of air traffic from flight trajectory data [63, 104].
With the exponential growth in air travel, understanding and managing the
intricate dynamics of air traffic flows has become paramount [36]. By har-
nessing the power of Maximum Entropy reconstruction, we aim to unveil
underlying patterns and structures within flight trajectory data, thereby
shedding light on the complexity of air traffic systems.
Our approach hinges on several key pillars, each meticulously designed
to navigate a specific aspect related to angular density reconstruction. First
and foremost, we formulate the reconstruction problem within the framework
of the Maximum Entropy principle. By leveraging this principle, we seek
solutions that strike a balance between fidelity to observed constraints and
maximization of entropy, thus ensuring maximal neutrality while remaining
consistent with available data.
A distinguishing feature of our methodology lies in the formulation of
constraints in terms of Fourier coefficients derived from statistical samples.
This not only allows us to encapsulate essential characteristics of the angular
distribution but also enhance the integration of empirical data into the recon-
struction process. Moreover, to account for the inherent variability within
the data, we introduce a fit term expressed via the squared Mahalanobis
distance, thereby, weighing the evidence of each Fourier coefficient according
to its uncertainty expressed through the covariance term.
Critically, we provide a rigorous treatment of the underlying optimiza-
tion problem, associated with the infinite-dimensional space. Drawing upon
concepts from partially finite convex programming and leveraging insights
into the conjugate of convex integral functionals, we devise a robust method-
ology for tackling this challenge. Importantly, our approach avoids the need
for discretization, thereby offering an efficient numerical solution for com-
puting Maximum Entropy solutions.
50
3.2 Literature review
51
uncertainty. Gomes et al. (2014) developed a methodology to estimate prob-
ability densities even in environments with noisy data, proving robust in sim-
ulations but needing further validation in real scenarios [51]. This capability
is particularly relevant in fields where data integrity may be compromised.
Lately, with the introduction of new methods, advancements in entropy
estimation techniques have played a vital role in enhancing the effectiveness
of the Maximum Entropy principle. In 2011, Behmardi et al. presented a
parametric entropy estimator that applied the Maximum Entropy principle
for approximating distributions without relying on local density estimation.
The estimator showed significant accuracy improvements over traditional
methods, especially in complex sensor network data scenarios, providing a
robust tool for various applications in signal processing and machine learn-
ing [13]. Although their parametric entropy estimator can avoid the pitfalls
of local density estimation, providing a more reliable tool for data analysis
in complex environments like sensor networks, it might not be as flexible
in handling diverse or non-standard data distributions without adjustments
or extensions to the model. This is where Lakshmanan and Pichler’s recent
study in 2023 introduced a novel approach to soft quantization using entropic
regularization, providing a promising alternative to traditional hard quanti-
zation methods that often lead to significant information loss. Their method,
grounded in robust theoretical formulations, demonstrates improved robust-
ness and signal fidelity across various data sets, especially in the presence
of noise and data variability. However, the practical implementation details
and computational efficiency of this approach are not extensively covered,
which could be crucial for real applications where computational resources
and processing time are limited [66].
Recent developments in this domain have shown varied applications and
improvements to the principle. From a theoretical perspective, one notable
application of the Maximum Entropy principle is in the reconstruction of an-
gular probability density functions (PDFs) from observed or simulated data.
In 2017, Gresele and Marsili explored both the philosophical and practical
aspects of Maximum Entropy principle, highlighting its applicability across
various scientific domains. However, the study only focused on theoretical
aspects only, lacking examples of direct empirical application [52].
A few years later, Amari et al. (2018) and Gzyl et al. (2021) have pro-
vided deeper insights into the mathematical underpinnings of the Maximum
52
Entropy principle, connecting it with significant statistical concepts like the
Wasserstein distance and Kullback-Leibler divergence [7, 53]. Due to lacking
of experimental results on simulation, these studies offer a comprehensive
understanding of the principle’s integration with broader probability the-
ory, though their complexity may pose challenges for those without a strong
mathematical background. Wan et al. in 2019 [120] utilized Maximum En-
tropy to enhance the accuracy of Principal Component Analysis, providing a
clear advantage in data interpretation, although the method’s increased com-
putational demand may limit its applicability for large or real-time datasets.
These studies collectively underscore the importance and potential of
Maximum Entropy in a range of applications, from enhancing computational
algorithms to addressing real data challenges. The principle’s adaptability
and robustness, demonstrated across different scenarios and data complexi-
ties, support its continued use and development for complex systems analysis,
including air traffic complexity estimation.
In air traffic management, the precise reconstruction of angular distribu-
tions is crucial for assessing traffic complexity and managing congestion [75].
The Maximum Entropy principle is particularly well-suited for this task,
offering a robust method for dealing with the inherent uncertainties and
variabilities of air traffic data. This principle allows us to derive the most
generalized form of a probability distribution by maximizing entropy, sub-
ject to the empirical constraints provided by observed data, such as angular
means or variances.
Prior research has explored various approaches to angular density recon-
struction, with a particular focus on incorporating the principles of Maximum
Entropy to enhance the fidelity of the reconstructed PDFs.
The motivation for employing the Maximum Entropy principle in an-
gular density reconstruction lies in its ability to provide a principled and
data-driven approach to probability distribution estimation. Unlike para-
metric methods that rely on specific functional forms, Maximum Entropy
reconstruction allows for greater flexibility and adaptability to the under-
lying data distribution. This flexibility is particularly advantageous in the
context of air traffic complexity estimation, where traffic patterns can exhibit
significant variability and non-linearity.
In simulation settings, the Maximum Entropy principle is highly effec-
53
tive for reconstructing angular distributions, especially in air traffic man-
agement. This method involves setting constraints based on available data,
such as average angles and variances, to develop the most unbiased probabil-
ity distributions possible. By maximizing entropy within these constraints,
the principle ensures that the resulting angular distributions in simulations
accurately capture the diverse and unpredictable nature of air traffic pat-
terns. This approach not only improves the realism of the simulations but
also provides deeper insights into the complexities of air traffic dynamics.
3.3 Setting
In this chapter, we address the complexity of air traffic by modeling airspace
as a two-dimensional (2D) image, segmented into a Cartesian grid where each
intersection point denotes the center of a defined zone, referred to as a cell.
For the sake of simplicity, we employ circular cells. For each cell we apply
the followings steps:
(1) Select the trajectories that pass through the cell and determine, for each
one, the angles corresponding to entry and exit points, with respect
to a given fixed direction. From such a statistical sample estimate,
via empirical means, the Fourier coefficients of the angular probability
distributions on the circle.
54
(a) A map of full trajec- (b) All trajectories cross- (c) Angular histogram of
tories. ing a selected cell. the selected cell.
Figure 3.1: An example map of trajectories through a selected cell and the
angular histogram of the selected cell.
Each trajectory that intersects the cell enters with an angle (with respect
to a fixed direction) which we call heading and denote by θj . The complex
55
number eiθj is the corresponding point on the unit circle in C.
We now regard the θj ’s as realizations of a random angle variable θ.
We are then interested in estimating the probability density p(θ) of such a
variable. From the angular sampling θj , we may build a set of empirical
moments. The Fourier coefficients of p are defined as
1 Z 2π
al = p(θ) cos(lθ) dθ, l ∈ N, (3.3)
π 0
and
1 Z 2π
bl = p(θ) sin(lθ) dθ, l ∈ N∗ . (3.4)
π 0
1 X 1 X
xl = cos(lθj ) and yl = sin(lθj ) (3.5)
πn j∈J πn j∈J
z = (x1 , y1 , . . . , xN , yN ) ∈ R2N .
56
then aim at solving the following constrained optimization problem:
Min H(p)
s.t. p ∈ L1 ([0, 2π)),
Z 2π
1= p(θ) dθ,
(P◦ ) 0
1 Z 2π
xl = p(θ) cos(lθ) dθ, l ∈ {1, . . . , N },
π 0
1 Z 2π
yl = p(θ) sin(lθ) dθ, l ∈ {1, . . . , N }.
π 0
Here, H(p) denotes the Shannon neg-entropy of p:
Z 2π
H(p) = p(θ) ln p(θ) dθ.
0
57
Therefore, in Problem (P), the squared Mahalanobis distance between Ap
and z is penalized, as a model fitting requirement. Note that Σ is not known
in practice. It will be necessary to estimate it. Using the Mahalanobis dis-
tance requires, in principle, to dispose of a positive definite estimate of Σ.
However, when the estimated covariance matrix is singular, the inverse does
not exist, and then we have to resort to a degenerate version of the Maha-
lanobis distance, given by
( D E
z′ , Σ† z′ if z′ ∈ ran Σ,
∥z′ ∥2Σ† = (3.6)
∞ otherwise,
Our aim in this subsection is to show how Fenchel duality can help us solve
our optimization problems via the tractable dual problem. We first review a
few basic definitions and results. The reader unfamiliar with convex analysis
may refer to [100, 134].
58
3.4.1 Review of useful convex analysis
Let L be any real vector space. A function f : L → [−∞, ∞] is said to be
convex if its epigraph, the set
n o
epi f := (x, α) ∈ L × R f (x) ≤ α ,
is a convex subset of L×R. It is said to be proper convex if it never takes the
value −∞ and it is not identically equal to ∞. A function g : L → [−∞, ∞]
is said to be concave if −g is convex, and proper concave if −g is proper
convex. Notice that g is concave if and only if its hypograph
n o
hypo g := (x, α) ∈ L × R g(x) ≥ α
is convex. The effective domain of a convex function f is the set
n o
dom f = x ∈ L f (x) < ∞ .
The effective domain of a concave function g is the set
n o
dom g = x ∈ L g(x) > −∞ .
The only functions that are both convex and concave are the affine functions
and the functions identically equal to ±∞. Clearly, the domain of each
affine function is equal to L, as a convex or as a concave function. The other
remaining functions would give rise to specific notation if they were not so
rare.
It is customary to use indicator functions to encode constraints in opti-
mization problems. Recall that the indicator function of a subset C ⊂ L is
the function
0 if x ∈ C,
(
δC (x) :=
∞ otherwise.
Now, let L and Λ be vector spaces paired (in the algebraic sense) by a
bilinear mapping
⟨·, ·⟩ : L × Λ −→ R
(x, ξ) 7−→ ⟨x, ξ⟩ .
An standard example is L = Rd = Λ with the usual Euclidean scalar product.
The convex conjugate of a function f (that is convex or not) is the function
n o
f ⋆ (ξ) := sup ⟨x, ξ⟩ − f (x) x ∈ X , ξ ∈ Λ.
59
The concave conjugate of a function f (that is concave or not) is the function
n o
f⋆ (ξ) := inf ⟨x, ξ⟩ − f (x) x ∈ X , ξ ∈ Λ.
f ⋆⋆ := (f ⋆ )⋆ = f.
Then n o n o
η := infd f (x) − g(x) = sup g⋆ (ξ) − f ⋆ (ξ)
x∈R ξ∈Rd
The above theorem asserts equality between the optimal values of two
problems, together with attainment in the second one. It is customary to
call these underlying optimization problems the primal problem and the dual
problem, respectively. In the above theorem, both the primal and dual are
finite dimensional. Moreover, in order to tackle problems such as (P), we
must put some linear mapping in the picture. This is the purpose of the next
theorem.
60
3. A : L → Rd , a linear mapping;
one has
n o n o
η := inf F (x) − g(Ax) = maxd g⋆ (λ) − F ⋆ (A⋆ λ) .
x∈L λ∈R
are respectively referred to as the primal and dual problems. The function
D := g⋆ − F ⋆ ◦ A⋆ appearing in the dual problem is referred to as the dual
function. The theorem asserts the equality between the optimal values of the
primal and dual problems, together with dual attainment. The next result
will provide conditions that will guarantee primal attainment as well.
Theorem 3 (Primal attainment). With the notation and assumptions of the
previous theorem, assume in addition that
(QC ⋆ ) ri dom g⋆ ∩ ri dom(F ⋆ ◦ A⋆ ) ̸= ∅.
61
The latter result provides not only a condition for primal attainment,
but it also makes appear as a watermark the possibility of a link between
primal and dual solutions. The bi-conjugate relationships in Assumption (a)
are central in the theorem, and the difficulty in our problem is to prove that
the entropy satisfies this property. It turns out that, in our context, it is
possible to compute the conjugate of the Shannon entropy by conjugating
through the integral sign.
Generally speaking, an integral functional is a functional of the form
Z
H(u) = h(u(ω), ω) dµ(ω), u ∈ L.
Ω
62
Theorem 4 (Rockafellar). Let L and Λ be spaces of measurable functions
on Ω paired by means of the standard integral bilinear form
Z
⟨f, φ⟩ = f (ω)φ(ω) dω.
Ω
63
sign is permitted. Assume that, as in Theorem 3, H ⋆⋆ = H and g⋆⋆ = g.
Assume finally that the conjugate integrand h⋆ is differentiable over R, and
that there exists some dual-optimal vector λ̄ in int dom D. If
ū(ω) := h⋆ ′ [A⋆ λ̄](ω), ω ∈ L,
Moreover,
1
(g◦ )⋆ (λ◦ , λ) = λ◦ + ⟨z, λ⟩ − ∥λ∥2Σ .
2α
64
Accounting for the fact that, as we have seen above, the entropy can be
conjugated by conjugating through the integral sign, the dual problem reads:
which reduces to
Z 2π D E
1 T(θ) exp λ̄, T(θ) dθ
0=z− Σλ̄ − 0
Z 2π .
α D E
exp λ̄, T(θ) dθ
0
It should be noticed that the above system is also the optimality system of
the problem
1 Z 2π
Maximize ⟨λ, z⟩ − ∥λ∥2Σ − ln exp ⟨λ, T(θ)⟩ dθ
(D̃) 2α 0
s.t. λ ∈ R2N .
1 Z 2π
D̃(λ) := ⟨λ, z⟩ − ∥λ∥2Σ − ln exp ⟨λ, T(θ)⟩ dθ
2α 0
65
The function h⋆◦ (τ ) = exp(τ − 1) obviously meets the requirements of
Theorem 5. Provided we can obtain a dual solution (λ̄◦ , λ̄), the optimal
density is then given by
D E
h D Ei exp λ̄, T(θ)
p̄(θ) = exp λ̄◦ − 1 + λ̄, T(θ) =Z 2π D E , (3.8)
exp λ̄, T(θ) dθ
0
Remark 7. The primal dual relationship obtained above shows that the
solution to our Maximum Entropy problem exists and belongs to the expo-
nential family. The particular form in Equation (3.8) is well known. In the
unrelaxed form of Problem (P), it can be obtained using classical inequal-
ities (see e.g. [64], cited in [49]). It is worth noticing that, in [50], similar
results are obtained without resorting to Fenchel duality. However, we wish
to stress some advantages one may have to resort to Partially Finite Convex
Programming. First, we notice that the latter formalism can deal in a very
elegant way with problems in which constraints are relaxed. More impor-
tantly, by using our approach, we learn that the appropriate value of the
Lagrange multiplier maximizes the well behaved dual function, which offers
a powerful way to derive a computationally efficient algorithm. Finally, we
see that the proposed framework may be used with a variety of entropy func-
tional (such as Burg [23], Renyi [97], Tsallis [117, 80], etc.). Although this
goes beyond the scope of this chapter, it certainly deserves to be mentioned.
The above developments yield the following algorithm for the computation
of Maximum Entropy densities.
66
Algorithm 2 Computing Maximum Entropy densities
1: Input: N ∈ N∗ , z ∈ R2N , α > 0.
2: Output: The Maximum Entropy probability angular distributions
3: Maximize the dual function
1 Z 2π
D̃(λ) := ⟨λ, z⟩ − ∥λ∥2Σ − ln exp ⟨λ, T(θ)⟩ dθ.
2α 0
4: From the dual optimal solution λ̄ obtained in the previous step, form the
optimal density D E
exp λ̄, T(θ)
p̄(θ) = Z 2π D E .
exp λ̄, T(θ) dθ
0
67
are looking for the most unbiased information on the underlying random
variable (the angle θ) so that fine (high resolution) details are unwanted;
moreover, the empirical Fourier coefficients are expected to become poorer
approximations of the true ones as the frequency l increases, in particular
when the number of angular samples is low. It is therefore only natural to put
a limit N on the highest frequency to be accounted for. The dimension of the
dual variable will then be 2N . In the context of simulated random angles, we
observe that the Kullback-Leibler entropy of the true density p◦ relative to
the reconstructed density p decreases as N increases, and stabilizes beyond
some value of N . Otherwise expressed, beyond a certain value of N , the
gain in information provided by additional Fourier coefficients is negligible
or non-existent.
For example, Figure 3.2 illustrates that there is no gain beyond N = 50
in our simulation, indicating that higher frequencies do not substantially
improve the approximation of the true distribution. This finding supports the
principle of using a limited number of Fourier coefficients to avoid overfitting
and unnecessary computational complexity in the estimation process.
68
lection rules. Among these rules, the Morozov discrepancy principle states
that α should be such that the corresponding solution p̄ should give a residual
∥z − Ap̄∥ equal to a number strictly greater than, but close to, the estimated
size ρ of the error on the data. Therefore, in order to implement the Moro-
zov principle, a reasonable upper bound on the error should be known, an
assumption that we make in the context of this chapter.
Our strategy is as follows: starting from an initial guess α◦ > 0, we
compute the corresponding Maximum Entropy solution, which we now de-
note by p̄α◦ . We then compute the residual ∥z − Ap̄α◦ ∥. If the latter residual
is above the value 1.105ρ, then set α1 > α◦ ; otherwise set α1 < α◦ . We
then iterate on this process, until α gives a residual approximately equal to
1.105ρ. In practice, a dichotomy strategy can be used so as to converge to the
appropriate value of α. The detailed information can be seen in Algorithm 3.
10: end if
11: if residual > 1.105ρ then
12: Set αmin = (
αi
αmin +αmax
if αmax < ∞
13: Set αi+1 = 2
µαi otherwise
14: end if
15: Set i = i +1
16: end while
17: Return last value of α
69
3.5 Simulations
This simulation will allow us to get an idea of the sample size that is necessary
for obtaining relevant information, merely by comparing the true density with
the one obtained through the maximization of entropy.
We start with the design of densities on [0, 2π). To this end, we recall
the definition of the bump function:
1
exp − if θ ∈ (−1, 1),
ψ◦ (θ) = 1 − θ2
0 otherwise.
Recall that ψ◦ is a C ∞ -function, whose support is indeed the compact interval
[−1, 1]. We can emulate the behaviour of a Dirac distribution by letting
1
!
θ Z
ψβ (θ) := ψ◦ in which c := ψ◦ (θ) dθ.
βc β
Here, β is a strictly positive parameter. The closer β to zero, the more ψβ
is concentrated about the origin. It is readily seen that the integral of ψβ is
equal to 1 whatever may be the value of β, and that the support of ψβ is the
interval [−β, β].
We may now define, for the purpose of simulation, densities of the form
p
p◦ (θ) = ck ψβk (θ − θk ), (3.9)
X
k=1
70
in which the ck ’s are positive numbers that sum up to 1. The translation
parameters θk are chosen in such a way that the support of p◦ is contained
in [0, 2π]. We then simulate angular samples that follow such a distribution.
This simulation will allow us to get an idea of the sample size that is
necessary for obtaining relevant information, merely by comparing the true
density with the one obtained through the maximization of entropy.
To visually demonstrate the effectiveness of our simulation in capturing
the original distribution’s characteristics, we proceed with a specific example.
Figure 3.3 is one of a simulation example based on the original prob-
ability p◦ (θ) with 2 peaks and β = 0.1. It shows the underlying (original)
probability density p◦ (θ) and the histogram obtained by simulation.
71
Following the successful validation of the simulation, we proceed to refine
our model to enhance its performance further.
72
Figure 3.5: Residual values when α values from 1 to 10000.
Reconstructions
73
simpler scenarios with fewer peaks.
In the more complex five-peak scenario (as shown in Figure 3.7), the
reconstructed density still captures the general shape and peak positions of
the original density, which is commendable given the increased complexity.
In all situations, the reconstructed density closely mirrors the shape of
the original density successfully capturing the positions and general form of
the peaks. However, there are minor differences, particularly in the central
peak where the optimal density does not perfectly align, hinting at slight
deviations. Despite these small deviations, the optimized distribution is a
close approximation of the original distribution.
Figure 3.6: Results on the Original and Optimal densities with two peaks.
The slight deviations in peak heights and widths provide avenues for re-
finement, but overall, the optimization algorithm appears to be performing
effectively. It is successful in identifying and adapting to the multiple modes
of the distribution, which is a positive outcome, a particularly positive devel-
opment within the challenging multi-peak environment. Further fine-tuning
of the algorithm might lead to even closer alignment, enhancing the model’s
precision.
The effectiveness of the BFGS algorithm in maximizing concave, smooth,
and R-valued dual functions is further explored by comparing its performance
with several other optimization methods. These include Powell [92], Nelder-
74
Figure 3.7: Results on the Original and Optimal densities with five peaks.
75
2. Powell method. It is a derivative-free optimization method using a
series of linear searches to find the minimum value of a function. The
optimization process begins with a set of initial directions (usually the
unit vectors of the coordinate axes). At each iteration, the method
performs a linear search along each of these directions to reduce the
value of the objective function f (x).
Given a set of directions {d1 , d2 , . . . , dn }, this method finds the optimal
point xk+1 by performing a minimization along each direction di as:
xk+1 = xk + λi di ,
dnew = xn − x0 .
xr = xc + α(xc − xh )
xe = xc + γ(xr − xc )
with γ > 1.
76
• Contraction: If reflection fails, try contracting the point xr to-
wards xc :
xc = xc + ρ(xh − xc )
with 0 < ρ < 1.
• Shrinkage: If contraction also fails, shrink the entire simplex
towards the best point xl :
xi = xl + σ(xi − xl )
∇f (xk )T ∇f (xk )
βk−1 =
∇f (xk−1 )T ∇f (xk−1 )
xk+1 = xk + αk dk
Bk pk = −∇f (xk )
77
In TNC, pk is approximated by truncating the iterations of the conju-
gate gradient method, finding a near-optimal solution. The next point
is updated as:
xk+1 = xk + αk pk
where αk is the optimal step size.
xk+1 = xk + pk .
78
Figure 3.8: Comparison of computational time by seven optimal models with
five peaks.
79
Various measures of complexity may be envisaged from the obtained
probability density. In the context of aircraft trajectories, complexity mea-
sures are used to understand and quantify the level of predictability and
potential conflicts in these trajectories. It can be mentioned by some key
points:
80
3.7 Conclusions
81
for complexity assessment, supporting more sophisticated air traffic control
strategies.
This research not only establishes a robust mathematical framework for
estimating angular density but also paves the way for significant advance-
ments in managing the complexities of growing air traffic. As the volume
of air traffic continues to increase, the relevance and necessity of deploying
advanced analytical tools such as those developed in this study will become
ever more critical.
82
Chapter 4
4.1 Introduction
4.1.1 Motivation
Air traffic complexity involves the dynamic interactions and trajectories of
aircraft within the airspace. This chapter presents a novel methodology for
constructing complexity maps that strategically anticipate and manage air
83
traffic challenges. By developing advanced complexity measures, we aim
to enhance ATM systems and ensure efficient and safe operations through
pre-emptive planning.
Current airspace management faces significant constraints due to the
rigid structures and the cognitive limitations of ATCOs. They are tasked
with safely managing an increasing number of aircraft, especially within con-
gested sectors. Despite technological advancements in ATM, the combination
of fixed airspace configurations and controller workload constraints lead to
inefficiencies and potential safety risks. As air traffic continues to grow,
these challenges become more pronounced, necessitating innovative solutions
to maintain and improve safety and efficiency.
Proactive development of complexity measures and maps is essential for
effective ATM. Complexity maps provide valuable insights into traffic pat-
terns and system responses to various conditions, facilitating better strategic
planning. Lee et al. (2009) [68] demonstrated how complexity maps can op-
timize aircraft entry points and manage sector boundaries, while Hong et al.
(2014) [57] proposed a Variance Quadtree-based Map for resolving conflicts
by identifying optimal strategies for aircraft entries into sectors.
Further, studies by Juntama et al. (2020) [63] and Delahaye et al.
(2022) [36] showed that complexity maps are instrumental in optimizing 4D
trajectories and evaluating local traffic disorder. These tools are crucial
for enhancing air traffic control operations, reducing delays, and improving
overall airspace capacity by anticipating and managing complexity before it
arises. By leveraging complexity maps, ATM systems can better handle the
intricacies of air traffic, ensuring that safety and efficiency are maintained
even as traffic volumes increase.
4.1.2 Overview
Our methodology for constructing air traffic complexity maps begins with
discretizing the airspace using a Cartesian grid. At each grid point, we define
a circular cell with a radius compliant with safety regulations. For each cell,
we compute an angular distribution of trajectories using the principle of
Maximum Entropy.
This distribution thus obtained belongs to the family of Generalized von
84
Mises (GvM) distributions. The determination of the probability density
of the underlying angular variable is achieved using Fenchel duality, where
empirical moment constraints are relaxed in accordance with the estimated
covariance matrix of the data, implying the use of Mahalanobis distance.
The details of this strategy were proposed in Chapter 3, and the key points
are summarized in Subsection 4.2.1. We then calculate the geodesic distance
between this distribution and the uniform distribution, where a smaller dis-
tance indicates higher local traffic complexity. The local complexity index
takes into account this distance and the local intensity of the traffic.
To manage the computational intensity of evaluating the geodesic dis-
tance, we employ a practical approximation using the symmetrized Kullback-
Leibler (SKL) divergence. This allows for efficient estimation of local traffic
complexity across the grid, making the methodology feasible for real-time
applications.
85
We will see in Subsection 4.3.1 that the natural parametrization of the
GvM distribution manifold suggested by Fenchel duality, which corresponds
to the Maksimov form, theoretically allows for the calculation of this geodesic
distance by solving a relatively high-dimensional system of ordinary differ-
ential equations. For reasons of algorithmic and numerical simplicity, we will
then use the classical approximation of the geodesic distance by the sym-
metrized Kullback-Leibler divergence.
86
Figure 4.1: Example of a cell around a grid point in the airspace, with
trajectories across ATL on December 26, 2018.
the affine space {z} + ran Σ. If Σ is positive definite, then Σ† is merely the
inverse of Σ, and ∥z − Ap∥Σ† is the Mahalanobis distance between z and Ap.
The Maximum Entropy problem reads
α
Minimize H(p) + ∥z − Ap∥2Σ†
(P) Z 2
s.t. 1 = p(θ) dθ.
S
t ln t if t > 0,
Z
H(p) = h p(θ) dθ with h(t) := 0 if t = 0,
S ∞ if t < 0
87
and α > 0 fixes the balance between the entropy and the attach term. It was
shown in [86] that a Fenchel duality approach enables to solve the infinite
dimensional constrained problem (P) via a finite dimensional unconstrained
smooth concave maximization problem, which moreover involves Σ rather
than Σ† . The solution to Problem (P) is given by
D E
exp λ̄, T(θ)
p̄(θ) = Z D E , (4.2)
exp λ̄, T(θ) dθ
S
Observe that:
1 N
!
p(θ) = exp κk cos k(θ − µk ) , (4.3)
X
G0 k=1
where
N
Z !
G0 = exp κk cos k (θ − µk ) dθ (4.4)
X
S k=1
k=1
88
which is the canonical form of an exponential distribution:
h i
p(θ) = exp ⟨λ, T (θ)⟩ − ln G0 (4.7)
89
Proposition 10. The score satisfies:
90
Proof. Taking the second derivative of the score yields:
∂i ∂j pθ ∂i pθ ∂j pθ
∂i ∂j lθ = − . (4.18)
pθ p2θ
Geodesic distances
• ∇X Y − ∇Y X = [X, Y ].
91
Proposition 14. Under the same assumptions as in Proposition 13, if
γ : [a, b] → M
∇γ̇ γ̇ = 0. (4.22)
The coefficients Γkij are called the Christoffel symbols of the connection ∇.
Proposition 18. In coordinates, Equation (4.22) becomes:
γ̈ k + Γkij γ̇ i γ̇ j = 0, k = 1 . . . N. (4.24)
92
in which lη (x) := ln pη . Using Equation (4.25), it comes:
and
Z
∂λ2i λj G0 = Ti (θ)Tj (θ) exp ⟨λ, T(θ)⟩ dθ, (4.30)
S
1Z
ak (φ) = φ(θ) cos(kθ) dθ, l ∈ N,
π S
and
1Z
bk (φ) = φ(θ) sin(kθ) dθ, k ∈ N∗ .
π S
we see that G0 = πa0 (φ) and that the derivatives are given by
πak (φ) if i = 2k − 1
∂λi G0 (λ) =
k (φ) if i = 2k
πb
93
and
∂λ2i λj G0
1 ak+l + ak−l ak al
− 2 if i = 2k − 1, j = 2l − 1
2
a0 a0
1 bk+l − bk−l ak bl
if i = 2k − 1, j = 2l,
− 2
2
a0 a 0
=
1 bk+l + bk−l bk al
− 2 if i = 2k, j = 2l − 1,
2
a0 a0
1 −ak+l + ak−l − bk bl
if i = 2k, j = 2l
2
a0 a20
94
puted pretty much the same way using the well-known formula:
1
!
∂gli ∂glj ∂gij
Γpij = g pl + − (4.31)
2 ∂xj ∂xi ∂xl
where the convention of summation on repeated indices is taken and the
notation g pl stands for the (p, l) element of the inverse matrix g −1 . The third
partial derivative of ln G0 with respect to λi , λj , λp is readily obtained from
Equation (4.28):
All terms in Equation (4.32) are already known except for the first one.
Starting with Equation (4.29), all third partial derivatives can be obtained
using the Fourier coefficients of φ as indicated below:
1
∂ 3 λ2i λ2j λ2k G0 = (ai+j+k + ai+j−k + ai−j+k + ai−j−k ) ,
8
1
∂ 3 λ2i λ2j λ2k+1 G0 = (bi+j+k − bi+j−k + bi−j+k − bi−j−k ) ,
8
1
∂ 3 λ2i λ2j+1 λ2k+1 G0 = (−ai+j+k + ai+j−k + ai−j+k − ai−j−k ) ,
8
1
∂ 3 λ2i+1 λ2j+1 λ2k+1 G0 = (−bi+j+k + bi+j−k + bi−j+k − bi−j−k ) .
8
The geodesic distance between two GvMN distributions p, q is the in-
fimum of the lengths of smooth paths joining p and q. It is the length of a
minimizing geodesic γ with endpoints p, q. Such a curve is the solution of a
differential equation with boundary conditions:
∇γ̇ γ̇ = 0
(4.32)
γ(0) = p, γ(1) = q
95
with p = (p1 , . . . , p2N ), q = (q 1 , . . . , q 2N ) in λ coordinates. Finding the
geodetic distance between two GvMN distributions can be accomplished
numerically by a boundary value problem solver or a shooting algorithm. The
computational cost is quite high, but it turns out on some examples that a
pretty good approximation is given by the SKL divergence. It was therefore
decided to validate the concept using this value instead of the true one,
leaving the design of an efficient geodetic algorithm for future developments.
N
Z 2π !
f (θ) exp λ κi cos i (θ − µi ) dθ
X
0 i=1
2π
s !
= (η)eλQ(η)
1 + −1
) (4.34)
X
f O(λ
η λ|Q′′ (η)|
N
Q(θ) = κi cos i (θ − µi ) (4.35)
X
i=1
and η ranges over the roots of Q′ such that Q′′ (η) < 0, assuming that all
roots are of multiplicity one. Introducing the complex polynomial P :
N
P (z) = − jκj e−ijµj z j (4.36)
X
j=1
the roots of Q′ (θ) are exactly the intersection points of the closed curve
γ : t ∈ [0, 1] → P (ei2πt ) with the real axis and there is exactly N of them
corresponding to maxima of Q.
Now, using Formula (4.34) again for the normalizing factor in Equa-
tion (4.3), it comes:
ξ |Q |
′′ −1/2
(ξ)f (ξ)
P
lim E [f ]λκ1 ,...,λκN ;µ1 ,...,µN = ′′ −1/2 (ξ)
(4.37)
ξ |Q |
P
λ→+∞
96
where ξ ranges over the η realizing the global maximum of Q. So, in the
sense of distributions:
|Q′′ |−1/2 (ξ)δξ
P
lim pλκ1 ,...,λκN ;µ1 ,...,µN = Pξ ′′ −1/2 (ξ)
(4.38)
λ→+∞ ξ |Q |
The extra degree of freedom gained from the introduction of λ calls for a
normalization condition. One possible choice is to set κN = 1.
4.3 Validation
4.3.1 Simulation
We conduct testing on a single cell to assess the fundamental functioning of
the model. The results from this step provide essential insights into accuracy
and basic performance.
We start by performing simulations. We shall:
(e) Compute the SKL divergence between GvMN distribution q(θ) and
uniform distribution u(θ) as follow:
Z 2π Z 2π
u(θ) q(θ)
SKL (q∥u) = u(θ) ln d(θ) + q(θ) ln dθ. (4.39)
0 q(θ) 0 u(θ)
This simulation will allow us to get an idea of the sample size that is
necessary for obtaining relevant information, merely by comparing the true
density with the one obtained through the maximization of entropy. And
then based on the obtained SKL divergence, we can estimate the complexity.
97
To better understand this relationship, Table 4.1 presents the SKL di-
vergence between u(θ) and the GvMN q(θ) with varying numbers of peaks
and varying peak widths β. The goal is to understand how these parameters
affect the SKL divergence and their implications for calculating complexity
maps in ATM.
98
Figure 4.2: The GvMN with different β values and peak numbers: Top to
bottom, β = 0.01 and β = 0.3; Left: 1 peak, Right: 20 peaks.
increases from 0.01 to 0.3, show a significant decrease in SKL divergence from
14.93 to 8.76. This indicates that even with a single peak, making the peak
flatter and more spread out (larger β) can increase the complexity. Similarly,
with 20 peaks, the SKL divergence is the lowest among all peak numbers
across all values of β, emphasizing that multiple peaks in the GvMN make
it closer to the uniform distribution, especially when β value is also large.
These results not only provide insights into how the parameters of the
GvMN affect its divergence from the uniform distribution but also lay the
foundation for calculating complexity maps in ATM. Understanding these
relationships may assist air traffic managers in predicting, planning, and
adjusting air traffic flows to ensure optimal safety and efficiency. Making
the distribution of headings more concentrated can help reduce air traffic
complexity, thereby improving safety and efficiency in air traffic coordination.
4.3.2 Implementation
99
tributed according to the number of peaks (concentration points) and their
widths.
Building upon the insights gained from simulations, in order to validate
our findings on a real dataset, we proposed a new complexity index, called
Composite Complexity Index, in short, CCI (see Equation (4.40)). It is
weighted by the SKL divergence and the number of flights.
Our analysis utilizes flight data collected from ATL [107]. The dataset
provides a snapshot of flight trajectories on a specific date, enabling a detailed
examination of spatial distributions and flight path characteristics within
the airspace. The heading information from each trajectory is particularly
leveraged in our computation of the complexity map, which is based on
the SKL divergence determined in our simulation phase. These distances
quantify the directional dissimilarities between flight paths, offering insights
into the spatial complexities of aircraft movements around ATL.
Local complexity depends on the intensity of air traffic and on the angular
distribution of trajectories. Each individual quantity is, of course, insuffi-
cient. Intense traffic with a predominantly unidirectional distribution or a
spread distribution with low-intensity traffic is not very complex. Therefore,
we choose, as a complexity index, a criterion that increases with both the
spread of the distribution and the intensity of traffic, named CCI. Since the
spread of the distribution is measured using the SKL divergence, it is natural
to choose, as a complexity criterion, the ratio
n
CCI = , (4.40)
a + SKL (q∥u)
in which n is the number of flights through the cell under consideration and
SKL (q∥u) is the SKL divergence from q to u. The positive parameter a avoids
singularity of the formula in the case where q = u. It is chosen empirically,
in such a way that the contrast in the complexity map is satisfactory.
Dataset
100
ATL airport on December 26, 2018, which includes 215 flights. The detailed
attributes of each trajectory are outlined in Table 2.1 and Table 2.2.
Figure 4.3 illustrates the one-day traffic density of trajectories across
ATL on December 26, 2018. The highest number of flights occurs around
13:00 during these times. In contrast, the lowest flight density is observed
at around 06:00 to 08:00. There are smaller peaks observed in the afternoon
and evening, notably around 18:00 and 21:00. Overall, there is a noticeable
decline in flight activity from midnight to early morning, followed by a sharp
increase starting at 11:00, peaking in the mid-afternoon, and then stabilizing
at a relatively high level until the late evening.
Figure 4.3: Traffic density of trajectories across ATL on December 26, 2018.
All flight trajectories on this day can be seen in Figure 4.4. From the
figure, we can observe several prominent, thick lines suggesting popular flight
corridors, particularly in the northeast-southwest and northwest-southeast
directions.
101
Figure 4.4: All flight trajectories across ATL on December 26, 2018.
Experimental Results
102
(a) Trajectories (b) Based on traffic intensity
103
are rows from 0 to 6 and columns from 6 to 20 having fewer flights.
Figure 4.5c displays a complexity map based on SKL divergence. The
rows from 5 to 10 and columns from 7 to 15 show high complexity. Conversely,
rows from 0 to 4 and columns from 0 to 6 exhibit lower complexity. This
suggests that the central region experiences significant variability in distances
between points, while the northwestern region has lower variability.
Figure 4.5d is a composite of the previous two factors, providing a more
comprehensive complexity map. The cells at row 12 and columns from 2 to 18
stand out again, indicating high complexity where both SKL divergence and
the number of flights are high. The least complex regions are rows from 0 to 6
and columns from 6 to 20, showing that when both factors are low, the overall
complexity decreases. This combination offers a thorough understanding of
complexity distribution, aiding in identifying areas that require attention in
ATM.
Overall, the complexity map clearly indicates that the point of highest
complexity is at the intersection of the main flight paths. This position
represents the densest air traffic and highest workload for ATCOs. The points
with the lowest complexity correspond to minimal workload for controllers.
For the detailed information, we compare the distance between the GvM
distribution (blue line) and the uniform distribution (green dashed line) in
three scenarios, including high, medium, and low complexities.
While a cell with index (12,1) has high complexity with 103 flights cross
this cell, the low complexity index is found in cell (7,5) having only one
aircraft flying over. The obtained GvM distributions in 2 cells are illustrated
in Figure 4.6 and Figure 4.7, respectively.
Figure 4.6 shows three predominant headings (near π, 3π 2
, and 2π). This
distribution deviates significantly from the uniform distribution.
The cell (16,16) exhibits a moderate complexity index (see Figure 4.8),
the GvM distribution has a significant deviation from the uniform distribu-
tion, indicating an average level of complexity in this region. This is illus-
trated by the mixture of various headings (concentrating on two directions)
in the inset diagram, which suggests a moderate level of traffic diversity.
An interesting observation is that although cell (15,7) has only 5 flights
concentrated in the three directions close to π6 , π, and 2π, which increases
104
Figure 4.6: High complexity cell: many flights, diverse directions.
105
Figure 4.8: Medium complexity cell: moderate flights, moderate directions.
the complexity index, resulting in a high complexity level for this cell (see
Figure 4.9).
106
4.4 Conclusions
We have developed a methodology for assessing the local complexity of air
traffic. This strategy is decomposed in three steps: we first estimate, at
each point of a Cartesian grid, the probability density of aircraft headings
in corresponding cell, using the principle of Maximum Entropy (this was
detailed in Chapter 3); then we compute an approximation of the geodesic
distance to the uniform distribution, considered for obvious reasons as the
distribution of maximum complexity; finally, we define and compute an index
of complexity that combines both the angular distribution and the intensity
the local traffic. This index of complexity is evaluated at each point of our
grid, yielding a complexity map.
The chosen formula for the index of complexity is one choice among
others. However, our construction on real data seems to give what one may
expect. Other criteria may be obtained by considering the discrepancy be-
tween the heading distributions at entry and exit points of each cell. At all
events, it appears that our strategy for building complexity maps is both nu-
merically tractable and efficient. The problem of efficient geodesic distance
computation will be addressed in a future work.
107
108
Chapter 5
5.1 Introduction
In a number of applications, as in the modeling of animal or vegetal species
distributions or in the evaluation of air traffic density, one is facing the prob-
lem of estimating a multivariate probability distribution from observed data
in the form of samples, which are considered as being drawn independently
according to the unknown distribution. In this chapter, we develop a non-
parametric approach based on a combination of Maximum Entropy and the
method of moments.
In [42], such a problem was considered for discrete finite sample spaces,
109
and tackled by means of the Maximum Entropy principle leading to a finite
dimensional optimization problem under linear moment constraints. Each
moment constraint is obtained via some statistic, that is, by estimating the
expectation of some feature function. Such a problem is tackled by means
of Fenchel duality, which provides a flexible framework for introducing the
relaxation of the above mentioned constraints. A potential function encodes
the type of relaxation (named regularization in the above reference). While
the equality constraint (no relaxation at all) is encoded by means of the
indicator function of a singleton, box constraints are also considered, as well
as least-square relaxation.
The Fenchel duality formalism provides a machinery that contains all
cases. Our main purpose in this chapter is to show that the nice framework
proposed in [42] can be extented to the more general setting where the sample
space is no longer discrete. We therefore consider the optimization of the so-
called differential entropy or of its relative counterpart, the Kullback-Leibler
divergence between probability measures.
Such an extended framework relies on extensions of the Fenchel duality
theorem, as well as on Rockafellar’s theory of convex integral functionals.
Recall that the above equation expresses the expectation of fk under the
empirical measure
1X n
Pemp = δx ,
n j=1 j
110
in which δx denotes the Dirac measure at x. The Maximum Entropy principle
states, in its native form, that among all probability measures P that are
consistent with the moments constraints
Z
⟨fk ⟩ = fk dP, k = 1, . . . , m, (5.2)
dP
Z
ln dP if P ≺≺ P◦ ,
dP◦
K (P ∥P◦ ) :=
otherwise.
∞
Minimize K (P ∥P◦ )
(P◦ ) Z Z
s.t. 1 = dP, y = f dP.
Ω Ω
111
We therefore obtain the relaxed, functional counterpart of Problem (P◦ ),
namely
Minimize H(p) + U (Fp)
(P)
s.t. 1 = Ip,
in which the variable p is the Radon-Nikodym derivative of the searched
measure P with respect to P◦ . The neg-entropy functional H is defined by
t ln t if t > 0,
Z
H(p) := h◦ p(x) dP◦ (x) with h◦ (t) := 0 if t = 0,
∞ if t < 0
Clearly, this choice enforces the equality Fp = y. In [42], the case of box
constraints was also considered. It corresponds to the choice
m
Uy (s) = Uy(1) (s) = δBy (s) in which By := [yk − βk , yk + βk ]
Y
k=1
with β = (β1 , . . . , βm ) ∈ (R∗+ ). Since the convex conjugate of δBy (·) fails to
be differentiable, one may consider an approximation of the box constraint
that yields a differentiable conjugate as in [42, Section 5.2]. This will also be
considered in the present chapter. The least square relaxation corresponds
to the choice
1
Uy (s) = Uy(2) (s) = ∥y − s∥2 ,
2α
in which ∥ · ∥ denotes the Euclidian norm and α is some positive constant.
We may also consider, more generally, the choice
1 2
U (s) = y−s
2α Q
112
in which ∥z∥2Q = ⟨z, z⟩Q = ⟨z, Qz⟩ with Q a positive definite symmetric
matrix. An interesting example of this is obtained if we let Q = Σ−1 with Σ
an estimate of the covariance matrix of the vector y: the relaxation then
accounts for the variance of each principal component of the data vector
regarded as a random vector.
Implicitly, the above problem is stated in the space of integrable func-
tions having well-defined integrals against all feature functions. In general,
such a space is a subspace of L1P◦ (Ω), the space of measurable functions on Ω
that are integrable with respect to P◦ . We note that, in the case where all
feature functions are bounded, our natural workspace is L1P◦ (Ω) itself. In
general, however, the natural workspace for Problem (P) is a proper sub-
space of L1P◦ (Ω). The reason for this is that the linear mapping F is well
defined on the space
n o
p ∈ L1P◦ (Ω) pfk ∈ L1P◦ (Ω), k = 1, . . . , m .
Generally speaking, the implicit workspace involved in any moment problem
that involves feature functions taken in a set Φ is the Köthe space
n o
LP◦ (Ω; Φ) := p ∈ Bor(Ω) ∀f ∈ Φ, pf ∈ L1P◦ (Ω) . (5.3)
113
Theorem 20 (Fenchel duality). Suppose we are given L and L⋆ two vector
spaces, algebraically paired by the bilinear mapping ⟨·, ·⟩; A : L → Rd , a
linear mapping and A⋆ : Rd → L⋆ , its adjoint; H : L → (−∞, ∞], a proper
convex functional and H ⋆ : L⋆ → (−∞, ∞], its convex conjugate; G : Rd →
[−∞, ∞), a proper concave function and G⋆ : Rd → [−∞, ∞), its concave
conjugate. If the constraint qualification condition
(QC) ri(A dom H) ∩ ri(dom G) ̸= ∅
is satisfied, then
n o n o
inf H(p) − G(Ap) = maxd G⋆ (λ) − H ⋆ (A⋆ λ) .
p∈L λ∈R
The notation ri S stands for the relative interior of the set S, that is, the
interior of S in the induced topology of its affine hull. The effective domain
dom H of a convex function H is the set of points p for which H(p) < ∞,
and the effective domain dom G of a concave function is the set of points η
for which G(η) > −∞. The function G⋆ is the concave conjugate of G,
while H ⋆ denotes the convex conjugate of H. The reader unfamiliar with
convex analysis is invited to refer to [100, 55, 18].
Theorem 20 establishes the equality between the optimal value in the
primal problem (the minimization of the primal function H −G◦A) and that
of the dual problem (the maximization of the dual function G⋆ − H ⋆ ◦ A⋆ ). It
also says that the dual problem is attained, meaning that the dual problem
has at least one solution.
In order to tackle our generic Maximum Entropy problem, it remains to:
114
(a) H ⋆⋆ = H and G⋆⋆ = G;
(b) there exists λ̄ maximizing the dual function and p̄ ∈ ∂H ⋆ (A⋆ λ̄) such
that H ⋆ ◦ A⋆ has gradient Ap̄ at λ̄.
p(x) if x ∈ Ω \ T,
(
p̃(x) :=
q(x) if x ∈ T
115
Theorem 22 (Conjugacy through the integral sign). Let L and L⋆ be spaces
of measurable functions on Ω paired by means of the integral bilinear form
Z
⟨p, ϕ⟩ = p(x)ϕ(x) dP◦ (x).
Ω
In our context, L and L⋆ are respectively the Köthe space LP◦ (Ω; Φ), as
defined in Equation (5.3), and its dual Köthe space
n o
L⋆P◦ (Ω; Φ) := φ ∈ Bor(Ω) ∀p ∈ LP◦ (Ω; Φ), pφ ∈ L1P◦ (Ω) ,
n o
with Φ = 1Ω , f1 , . . . , fm . It was shown that if a family Φ of feature func-
tions is such that every f ∈ Φ is integrable over every set of finite measure,
then LP◦ (Ω; Φ) is decomposable, an moreover that if 1Ω ∈ Φ, then L⋆P◦ (Ω; Φ)
is also decomposable. See Lemma 1 and Proposition 1 in [78].
The entropy kernel h(t, x) ≡ h◦ (t) is clearly a measurable integrand that
is proper convex and lower semi-continuous in its first argument. Its convex
conjugate is given by
Since our pair of functional spaces LP◦ (Ω; Φ), L⋆P◦ (Ω; Φ) satisfies the require-
ments of Theorem 22, it comes that, for every φ ∈ L⋆P◦ (Ω; Φ),
Z
H ⋆ (φ) = exp φ(x) − 1 dP◦ (x).
Ω
116
Now, h⋆⋆◦ = h◦ since h◦ is proper convex and lower semi-continuous. More-
over, it is readily seen that
n o
P◦ (Ω; Φ) := p ∈ Bor(Ω) ∀φ ∈ LP◦ (Ω; Φ), pφ ∈ LP◦ (Ω) = LP◦ (Ω; Φ).
1
L⋆⋆ ⋆
See Property (f) page 207 in [78]. It follows that conjugacy through the
integral sign applies one more time: for all p ∈ LP◦ (Ω; Φ),
Z
H ⋆⋆ (p) = ◦ p(x) dP◦ (x) = H(p).
h⋆⋆
Ω
Let us now get back to our Maximum Entropy problem. Most potential
functions U are of the form U (η) = V (y − η) with V a proper convex lower
semi-continuous (and even) function. For example, the standard least-square
relaxation corresponds to the case where
1 2
V (η ′ ) = η′ , η ′ ∈ Rm .
2α
An easy computation shows that, in this case,
The dual problem to Problem (P) then consists in maximizing the function
Z
D(λ◦ , λ) := λ◦ + ⟨y, λ⟩ − V (λ) − exp(λ◦ − 1)
⋆
exp ⟨λ, f (x)⟩ dP◦ (x), (5.4)
Ω
in which we account for conjugacy through the integral sign (Theorem 22).
We note that the effective domain of D coincides with that of V ⋆ whenever
the feature function f has bounded components.
117
Provided the qualification of constraints (QC) and its dual counterpart
(QC ) are satisfied, Theorem 23 tells us that the solution to the Problem (P)
⋆
is given by
h D Ei D E
p̄(x) = exp λ̄◦ − 1 + λ̄, f (x) = exp(λ̄◦ − 1) · exp λ̄, f (x) ,
in which (λ̄◦ , λ̄) maximizes the dual function in Equation (5.4). Applying
Fermat’s principle, we find that (λ̄◦ , λ̄) must satisfy the condition
in which Z
D̃(λ) := ⟨λ, y⟩ − V ⋆ (λ) − ln exp ⟨λ, f (x)⟩ dx.
Ω
118
drawbacks such as failing to have full domain (so that the dual problem is
constrained) or being non-differentiable. A standard example is provided by
the case of box constraints.
In the smooth unconstrained case, quasi-Newtonian methods such as the
BFGS algorithm are expected to provide fast and accurate solution. If V ⋆
is not differentiable, then the situation is not hopeless at all, thanks to the
proximal gradient algorithm. As a matter of fact, in the latter case, the
function to be maximized is then a composite concave function, that is to
say, a sum of two concave functions one being differentiable, the other being
non-differentiable. The reader interested in the optimization of composite
functions and the proximal gradient algorithm may refer to [12, 115] and the
references therein.
5.4 Relaxation
No relaxation at all. In this case V (η) = δ{0} (η). The convex conju-
gate V ⋆ (λ) is then the function identically equal to zero. Condition (QC)
reads y ∈ ri(F dom H) is this case, and Condition (QC ⋆ ) is automatically
satisfied. The reason is that the effective domain of G⋆ then is the whole
of Rm , and that ri dom(H ⋆ ◦ A⋆ ) is nonempty, as the relative interior of a
nonempty convex set.
119
in which ⊙ denotes the pointwise product. In this case, Condition (QC)
reads
m
!
(yk − βk , yk + βk ) ri(F dom H) ̸= ∅.
Y \
k=1
The smooth inner approximation of box constraints with the same box as in
the previous paragraph, and its convex conjugate, are respectively given by
m m
!
λk
V (η) = εk ψβk (ηk ) and V (λ) =⋆
βk εk ln ch
X X
,
k=1 k=1 εk
in which εk controls the proximity of V (η) to the target box constraint po-
tential and, thereby, the proximity of the corresponding conjugate functions.
Figure 5.1 gives an idea of the approximation of the box constraint potential
and its conjugate. We note that the constraint qualification condition and
its dual counterpart are exactly as in the box constraint case. The advan-
tage here is that the dual function is differentiable, allowing the use of some
powerful quasi-Newtonian optimization method, such as BFGS.
120
Figure 5.1: Plots of εψβ (η) and its conjugate for β = 1 and various values
of ε.
5.5 Conclusion
In this chapter, we studied the Maximum Entropy approach to the problem of
reconstructing a spatial density from a finite statistical sample. The moment
constraints associated with the choice of feature functions are relaxed using
potential functions, as in [42].
The dualization of the problem allows the calculation of the solution den-
sity through the maximization of a dual problem in finite dimension, whose
properties depend on the chosen potential function. Partially finite convex
programming, as well as conjugacy of convex integral functionals, constitute
the major ingredients of the proposed developments. In all cases, the dual
problem is tractable, and the primal-dual relationship allows obtaining the
solution.
The tools developed in this chapter will make it possible to solve the
problem of reconstructing spatial densities without resorting to discretiza-
tion of the problem. This is a definite advantage, since the choice of a
121
discretization introduces a certain bias, which is contrary to the spirit of the
Maximum Entropy method.
The choice of a family of feature functions remains a crucial question in
this field. The authors of this chapter are currently focusing on this issue,
which will be the subject of a subsequent publication.
122
Chapter 6
In the first part of this thesis (Chapter 2), we concentrated on applying Ba-
yesian inference to improve flight trajectory predictions: Bayesian inference
provides a powerful framework for incorporating prior knowledge and updat-
ing predictions based on new information. This allows for a more nuanced
and probabilistic approach to trajectory prediction. By leveraging Baye-
sian methods, we can better estimate the uncertainties inherent in trajectory
predictions and develop more effective strategies for managing air traffic in
increasingly congested airspace. Our BayesCoordLSTM model, which inte-
grates Bayesian optimization and coordinate transformation into a hybrid
ConvLSTM framework, has demonstrated superior performance in predict-
ing flight paths with higher accuracy and efficiency compared to existing
methods.
In the second part (Chapters 3 to 5), we focused on analyzing air traffic
complexity using concepts derived from information theory. We developed
a method to construct air traffic complexity maps in two parts: first, at
each point in the airspace, we estimate the angular probability density of the
headings of aircraft entering a circular cell centered at the considered point,
and then we evaluate the local complexity.
The first part, which is the subject of the article presented in Chapter 3,
uses the Maximum Entropy principle combined with the method of moments.
Fourier coefficients are estimated by empirical averages from trajectory data
over a fixed time window, and an angular density maximizing the entropy
while respecting the estimated moments is obtained via partially finite convex
programming. The result is a GvM (Generalized von Mises) density.
123
The second part involves calculating a complexity index associated with
the obtained GvM density and the intensity of the flow traversing the cell.
The complexity associated with a given GvM is defined as a decreasing func-
tion of its geodesic distance to the uniform distribution. Calculating this
distance is theoretically possible but quite complex, so we approximate the
geodesic distance by the SKL (Symmetrized Kullback-Leibler) divergence. A
complexity map can then be constructed by multiplying the previous index
by the local traffic density.
Currently, the local traffic density is simply estimated by counting the
number of aircraft crossing the considered cell. However, we questioned the
notion of traffic density. A more instantaneous notion than the one used in
the approach described here could make sense in air traffic management. This
motivated the writing of the article presented in Chapter 5. In this article,
we consider a spatial distribution of points (which could be the positions
of aircraft at a given moment) as resulting from a random draw according
to an unknown law. We then propose an estimation of this law using the
Maximum Entropy principle combined with the method of moments. Again,
partially finite convex programming forms the theoretical core of the study.
124
ations in trajectories, indicating unpredictability and increasing the overall
complexity of the air traffic system.
Efficient geodesic distance computation: Future work could investigate
on optimizing algorithms for faster geodesic distance computation, incorpo-
rating real-time data for dynamic updates, and leveraging parallel processing
to handle larger datasets. Exploring these areas may enhance both accuracy
and efficiency in practical applications.
This research has established a robust framework for predicting flight
trajectories and assessing air traffic complexity. As air traffic volumes in-
crease, the deployment of advanced analytical tools such as those developed
in this study will become increasingly critical. By continuing to refine and
expand upon our methodologies, we can contribute to the ongoing efforts
to enhance the safety, efficiency, and effectiveness of air traffic management
systems worldwide.
125
126
References
[1] Abbasimehr, Hossein and Paki, Reza. “Improving time series forecasting
using LSTM and attention models”. In: Journal of Ambient Intelligence and
Humanized Computing 13.1 (2022), pp. 673–691. issn: 1868-5137, 1868-5145.
[2] Abdar, Moloud et al. “A review of uncertainty quantification in deep learn-
ing: Techniques, applications and challenges”. In: Information fusion 76
(2021), pp. 243–297.
[3] Adrían, García and Daniel, Delahaye and Manuel, Soler. “Air traffic com-
plexity map based on linear dynamical systems”. In: hal-02512103 (2019),
pp. 1–35.
[4] Ahmed, Mohammed A and El-Sheimy, Naser and El-Bakry, Hosam and
Moussa, Ahmed. “Machine learning-based prediction of air traffic complex-
ity and delay”. In: Journal of Air Transport Management 77 (2019), pp. 97–
109.
[5] Alharbi, Abdulrahman and Petrunin, Ivan and Panagiotakopoulos, Dim-
itrios. “Deep learning architecture for UAV traffic-density prediction”. In:
Drones 7 (2023), pp. 1–31.
[6] Amari, Shun-ichi. Information geometry and Its applications. Applied Math-
ematical Sciences. Springer Japan, 2016. isbn: 9784431559788.
[7] Amari, Shun-ichi and Karakida, Ryo and Oizumi, Masafumi. “Information
geometry connecting Wasserstein distance and Kullback–Leibler divergence
via the entropy-relaxed transportation problem”. In: Information Geometry
1 (2018), pp. 13–37.
[8] Amersfoort, Joost and Smith, Lewis and Teh, Yee and Gal, Yarin. “Simple
and scalable epistemic uncertainty estimation using a single deep determin-
istic neural network”. In: ICML. 2020.
[9] Arribas, Ignacio and Rodas-Jordá, Alberto and Toro-Rueda, Jose M del and
Parés, Ferran. “Evaluation of air traffic complexity with geometric metrics
and machine learning techniques”. In: IEEE Access 9 (2021), pp. 40894–
40905.
127
[10] Atencia, Miguel and Stoean, Ruxandra and Joya, Gonzalo. “Uncertainty
quantification through dropout in time series prediction by echo state net-
works”. In: Mathematics 8.8 (2020). issn: 2227–7390.
[11] Banavar, Sridhar and Sheth, Kapil S. and Grabbe, Shon. “Airspace complex-
ity and its application in air traffic management”. In: 2nd USA/EUROPE
Air traffic management R&D seminar (1998).
[12] Beck, Amir. First-order methods in optimization. SIAM, 2017.
[13] Behmardi, Behrouz and Raich, Raviv and Hero, Alfred O. “Entropy esti-
mation using the principle of maximum entropy”. In: 2011 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP).
2011, pp. 2008–2011.
[14] Bender, Steven R and Smith, Andrew E. “A geometric airspace complexity
metric for unmanned aerial systems”. In: Journal of Intelligent & Robotic
Systems 83.1 (2016), pp. 121–133.
[15] Bing, Dong. “Aircrafts monitoring and early warning based on air traffic
complexity measure analysis”. In: The Open Automation and Control Sys-
tems Journal 6 (2014), pp. 1563–1569.
[16] Bogert, Kenneth. Notes on Generalizing the Maximum Entropy Principle to
Uncertain Data. 2022. arXiv: 2109.04530 [[Link]].
[17] Boomsma, Wouter and Ferkinghoff-Borg, Jesper and Lindorff-Larsen, Kresten.
“Combining experiments and simulations using the maximum entropy prin-
ciple”. In: PLoS computational biology 10.2 (2014), e1003406.
[18] Borwein, Jonathan and Lewis, Adrian. Convex Analysis. Springer, 2006.
[19] Borwein, Jonathan M and Lewis, Adrian S. “Partially finite convex program-
ming, Part I: Quasi relative interiors and duality theory”. In: Mathematical
Programming 57.1 (1992), pp. 15–48.
[20] Borwein, Jonathan M and Lewis, Adrian S. “Partially finite convex program-
ming, part II: Explicit lattice models”. In: Mathematical Programming 57.1
(1992), pp. 49–83.
[21] Botea, Mariana and Ganesh, Arvind. “Prediction of air traffic complexity us-
ing machine learning algorithms”. In: Journal of Air Transport Management
52 (2016), pp. 11–21.
[22] Broyden, Charles George. “The convergence of a class of double-rank mini-
mization algorithms 1. general considerations”. In: IMA Journal of Applied
Mathematics 6.1 (1970), pp. 76–90.
128
[23] Burg, John Parker. “The relationship between maximum entropy spectra
and maximum likelihood spectra”. In: Geophysics 37.2 (1972), pp. 375–376.
[24] Cai, Guowei and Chen, Ben M and Lee, Tong Heng. “Coordinate systems
and transformations”. In: Unmanned rotorcraft systems (2011), pp. 23–34.
[25] Caticha, Ariel and Giffin, Adom. “Updating probabilities”. In: AIP Confer-
ence Proceedings. Vol. 872. 1. American Institute of Physics. 2006, pp. 31–
42.
[26] Caticha, Ariel and Mohammad-Djafari, Ali and Bercher, Jean-François and
Bessière, Pierre. “Entropic Inference”. In: AIP Conference Proceedings. AIP,
2011.
[27] Chen, Jeng-Shyang and Wu, Chen-Chien and Wang, Chien-Hua and Su, Yu-
Chih and Lee, Cheng-Che. “Agent-based approach for assessing unmanned
aircraft systems integration in the national airspace system”. In: IEEE Ac-
cess 5 (2017), pp. 11151–11160.
[28] Chen, Yunfei and Chen, Linlin and Yao, Wei and Sun, Jian. “A novel machine
learning approach to air traffic complexity prediction and management”. In:
Transportation Research Part C: Emerging Technologies 102 (2019), pp. 251–
266.
[29] Chen, Zhengmao and Guo, Dongyue and Lin, Yi. “A deep Gaussian process-
based flight trajectory prediction approach and Its application on conflict
detection”. In: Algorithms 13.11 (2020), p. 293. issn: 1999-4893.
[30] Cheng, Cheng et al. “Machine learning-aided trajectory prediction and con-
flict detection for internet of aerial vehicles”. In: IEEE Internet of Things
Journal (2021). issn: 2327-4662, 2372-2541.
[31] Chung, Junyoung and Gülçehre, Çaglar and Cho, KyungHyun and Bengio,
Yoshua. “Gated feedback recurrent neural networks”. In: CoRR abs/1502.02367
(2015). arXiv: 1502.02367.
[32] Coppola, Gianluca and Mandolini, Marco and Pieroni, Alessio. “Airspace
complexity assessment using topological methods”. In: IEEE Access 8 (2020),
pp. 75711–75720.
[33] Council, National Research and Engineering, Division on and Sciences, Phys-
ical and Aeronautics and Board, Space Engineering and Autonomy Research
for Civil Aviation, Committee on. “Autonomy research for civil aviation: To-
ward a new era of flight”. In: National Academies Press 2.3 (2014).
[34] De Leege, Arjen and Paassen, Marinus van and Mulder, Max. “A machine
learning approach to trajectory prediction”. In: AIAA Guidance, Navigation,
and Control (GNC) Conference. 2013, pp. 1–10.
129
[35] Degas, Augustin et al. “A survey on artificial intelligence (AI) and explain-
able ai in air traffic management: Current trends and development with
future research trajectory”. In: Applied Sciences 12.3 (2022), p. 1295. issn:
2076-3417.
[36] Delahaye, Daniel and García, Adrían and Lavandier, Julien and Chaimatanan,
Supatcha and Soler, Manuel. “Air traffic complexity map based on linear dy-
namical systems”. In: Aerospace 9.5 (2022). issn: 2226-4310.
[37] Delahaye, Daniel and Puechmorel, Stéphane. “Air traffic complexity: To-
wards an intrinsic metric”. In: Proceeding of the 3rd USA/Europe Air Traffic
Management R and D Seminar. 2000.
[38] Delahaye, Daniel and Puechmorel, Stéphane. Modeling and optimization of
air traffic. John Wiley & Sons, 2013. isbn: 9781848215955.
[39] Delahaye, Daniel and Puechmorel, Stephane. “Air traffic complexity based
on dynamical systems”. In: 49th IEEE Conference on Decision and Control
(CDC). Atlanta, GA, USA: IEEE, Dec. 2010, pp. 2069–2074. isbn: 978-1-
4244-7745-6.
[40] Dias, Fernando HC and Rey, David. “Robust aircraft conflict resolution un-
der trajectory prediction uncertainty”. In: Operations Research Letters 50.5
(2022), pp. 503–508.
[41] Ding, Jingtao et al. Artificial intelligence for complex network: Potential,
methodology and application. 2024. arXiv: 2402.16887 [[Link]].
[42] Dudík, Miroslav and Phillips, Steven J and Schapire, Robert E. “Maximum
entropy density estimation with generalized regularization and an applica-
tion to species distribution modeling”. In: Journal of Machine Learning Re-
search (2007), pp. 1217–1260.
[43] Facchinei, Francisco and Lucidi, Stefano and Palagi, Laura. “A truncated
Newton algorithm for large scale box constrained optimization”. In: SIAM
Journal on Optimization 12 (1999).
[44] Fang, Jianqiang and Li, Xiaoyan and Wei, Na and Lv, Xiaofei. “Agent-based
simulation of collaborative decision making for air traffic management”. In:
IEEE Access 7 (2019), pp. 135695–135707.
[45] Fu, Jun and Zhou, Wei and Chen, Zhibo. “Bayesian spatio-temporal graph
convolutional network for traffic forecasting”. In: arXiv:2010.07498 [cs] (2020).
arXiv: 2010.07498.
[46] Gao, Yi and Tang, Tao and Li, Xiaoyan and Liu, Fang. “Agent-based ap-
proach for modeling and simulation of air traffic management in en-route
airspace”. In: IEEE Access 7 (2019), pp. 61689–61699.
130
[47] Gao, Zhenyu and Yu, Yue and Wei, Qinshuang and Topcu, Ufuk and Clarke,
John-Paul. Noise-aware and equitable urban air traffic management: An op-
timization approach. 2024. arXiv: 2401.00806 [[Link]].
[48] Gatto, Riccardo. “Some computational aspects of the generalized von Mises
distribution”. In: Statistics and Computing 18 (2008), pp. 321–331.
[49] Gatto, Riccardo and Jammalamadaka, Sreenivasa Rao. “The generalized von
Mises distribution”. In: Statistical Methodology 4.3 (2007), pp. 341–353. issn:
1572-3127.
[50] Girardin, Valerie. “Méthodes de réalisation de produit scalaire et de prob-
lème de moments avec maximisation d’entropie”. In: Studia Mathematica
3.124 (1997), pp. 199–213.
[51] Gomes-Gonçalves, Erika and Gzyl, Henryk and Mayoral, Silvia. “Density
reconstructions with errors in the data”. In: Entropy 16.6 (2014), pp. 3257–
3272.
[52] Gresele, Luigi and Marsili, Matteo. “On maximum entropy and inference”.
In: Entropy 19.12 (2017), pp. 1–16.
[53] Gzyl, Henryk. Joint probabilities under expected value constraints, trans-
portation problems, maximum entropy in the mean, and geometry in the
space of probabilities. 2021. arXiv: 2109.01166 [[Link]].
[54] Hestenes, Magnus R. “The conjugate-gradient method for solving linear
systems1 ”. In: Numerical analysis 6 (1956), pp. 83–102.
[55] Hiriart-Urruty, Jean-Baptiste and Lemaréchal, Claude. Fundamentals of con-
vex analysis. Springer Science & Business Media, 2004.
[56] Hochreiter, Sepp and Schmidhuber, Jürgen. “Long short-term memory”. In:
Neural computation 9.8 (1997), pp. 1735–1780.
[57] Hong, Youkyung and Kim, Youdan and Lee, Keumjin. “Application of com-
plexity map to reduce air traffic complexity in a sector”. In: AIAA Guidance,
Navigation, and Control Conference. American Institute of Aeronautics and
Astronautics, 2014. isbn: 978-1-62410-317-9.
[58] Huang, Chiou-Jye and Kuo, Ping-Huan. “A deep CNN-LSTM model for
particulate matter (PM2.5) forecasting in smart cities”. In: Sensors 18.7
(2018).
[59] Huang, Jin and Ding, Weijie. “Aircraft trajectory prediction based on Baye-
sian optimised temporal convolutional network–bidirectional gated recurrent
unit hybrid neural nework”. In: International Journal of Aerospace Engineer-
ing 2022 (2022). Ed. by Sobel, Kenneth M., pp. 1–19.
131
[60] Jaynes, Edwin T. “Information theory and statistical mechanics”. In: Phys-
ical review 106.4 (1957), p. 620.
[61] Jaynes, Edwin T. Probability theory: The logic of science. Cambridge uni-
versity press, 2003.
[62] Jones, Donald R. “A taxonomy of global optimization methods based on
response surfaces”. In: Journal of global optimization 21 (2001), pp. 345–
383.
[63] Juntama, Paveen and Chaimatanan, Supatcha and Alam, Sameer and Dela-
haye, Daniel. “A distributed metaheuristic approach for complexity Rreduc-
tion in air traffic for strategic 4D trajectory optimization”. In: International
Conference on Artificial Intelligence and Data Analytics for Air Transporta-
tion. 2020, pp. 1–9.
[64] Kagan Abram, M. and Linnik Yuri, V. and Radhakrishna Rao, C. Charac-
terization problems in mathematical statistics. 1973.
[65] Khayyam, Hamid and Yildirim, Süleyman and Yücesan, Enver. “Compar-
ative evaluation of air traffic complexity metrics for en-route airspace”. In:
Transportation Research Part C: Emerging Technologies 119 (2020), p. 102725.
[66] Lakshmanan, Rajmadan and Pichler, Alois. “Soft Quantization Using En-
tropic Regularization”. In: Entropy 25.10 (2023), p. 1435.
[67] Laudeman I.; Shelden, S. “Dynamic density: An air traffic management met-
ric”. In: Technical Report, NASA/TM-1998-112226; NASA: Boulder, CO,
USA (1998).
[68] Lee, Keumjin and Feron, Eric and Pritchett, Amy. “Describing airspace com-
plexity: Airspace response to disturbances”. In: Journal of Guidance, Con-
trol, and Dynamics 32.1 (2009), pp. 210–222. issn: 0731-5090, 1533-3884.
[69] Li, Chengyu and Chen, Qian and Yan, Xiaoming and Wang, Jin and Lin,
Xingwei. “Learning air traffic as images: A deep convolutional neural network
for airspace operation complexity evaluation”. In: IEEE Access 8 (2020),
pp. 19732–19740.
[70] Li, Huanhuan and Jiao, Hang and Yang, Zaili. “AIS data-driven ship tra-
jectory prediction modelling and analysis based on machine learning and
deep learning methods”. In: Transportation Research Part E: Logistics and
Transportation Review 175 (July 2023), pp. 1–39. issn: 13665545.
[71] Li, Qiang and Guan, Xinjia and Liu, Jinpeng. “A CNN-LSTM framework
for flight delay prediction”. In: Expert Systems with Applications 227 (Oct.
2023), pp. 1–16. issn: 09574174.
132
[72] Li, Weigang and Alves, CJP and Omar, N. “Knowledge-based system for
air traffic flow management: Timetable rescheduling and centralized flow
control”. In: Transactions on Information and Communications Technologies
(1993), pp. 655–670.
[73] Liu, Hong and Lin, Yi and Chen, Zhengmao and Guo, Dongyue and Zhang,
Jianwei and Jing, Hailong. “Research on the air traffic flow prediction using
a deep learning approach”. In: IEEE Access 7 (2019), pp. 148019–148030.
[74] Liu, Hua and Li, Xiaoyan and Wei, Na and Lv, Xiaofei. “Agent-Based Simu-
lation of Aircraft Taxiing Operations”. In: IEEE Access 5 (2017), pp. 18936–
18945.
[75] Liu, Yan and Zhang, Jiazhong. “Predicting traffic flow in local area networks
by the largest Lyapunov exponent”. In: Entropy 18.1 (2016), p. 32.
[76] Ma, Lan and Tian, Shan. “A hybrid CNN-LSTM model for aircraft 4D
trajectory Prediction”. In: IEEE Access 8 (2020), pp. 134668–134680. issn:
2169-3536.
[77] Maksimov, Vyacheslav Mikhailovich. “Necessary and sufficient statistics for
the family of shifts of probability distributions on continuous BiCompact
groups”. In: Theory of Probability & Its Applications 12.2 (1967), pp. 267–
280.
[78] Maréchal, Pierre. “A note on entropy optimization”. In: Approximation, Op-
timization and Mathematical Economics. Springer, 2001, pp. 205–211.
[79] Maréchal, Pierre. “On the Principle of Maximum Entropy on the Mean as
a methodology for the regularization of inverse problems”. In: Probability
Theory and Mathematical Statistics (1998).
[80] Maréchal, Pierre and Navarrete, Yasmín and Davis, Sergio. “On the founda-
tions of the maximum entropy principle using Fenchel duality for Shannon
and Tsallis entropies”. In: Physica Scripta 99.7 (2024), p. 075265.
[81] Marwala, Tshilidzi and Nelwamondo, Fulufhelo V. “A multi-agent system
for air traffic flow management: A hybrid approach”. In: IEEE Transactions
on Intelligent Transportation Systems 22.7 (2020), pp. 4369–4381.
[82] Mirikitani, Derrick and Nikolaev, Nikolay. “Recursive Bayesian recurrent
neural networks for time-series modeling”. In: Transactions on Neural Net-
works 21.2 (2010), pp. 262–274.
[83] Mondoloni, Stephane and Liang, Diana. “Airspace Fractal Dimensions and
Applications”. In: USAEurope ATM Seminar. Dec. 2001.
133
[84] Moré, Jorge J. “Recent developments in algorithms and software for trust
region methods”. In: Mathematical Programming The State of the Art: Bonn
1982 (1983), pp. 258–287.
[85] Nelder, John A and Mead, Roger. “A simplex method for function minimiza-
tion”. In: The computer journal 7.4 (1965), pp. 308–313.
[86] Nghiem, Thi-Lich and Le, Viet-Duc and Le, Thi-Lan and Delahaye, Daniel
and Maréchal, Pierre. “Angular probability density reconstruction by Max-
imum Entropy”. preprint. July 2024.
[87] Pang, Yutian and Xu, Nan and Liu, Yongming. “Aircraft Trajectory Pre-
diction using LSTM neural network with embedded convolutional layer”.
In: Annual Conference of the PHM Society 11.1 (2019). issn: 2325-0178,
2325-0178.
[88] Pang, Yutian and Zhao, Xinyu and Hu, Jueming and Yan, Hao and Liu,
Yongming. “Bayesian Spatio-Temporal grAph tRansformer network (B-STAR)
for multi-aircraft trajectory prediction”. In: Knowledge-Based Systems 249
(Aug. 2022), p. 108998. issn: 09507051.
[89] Pérez Moreno, Francisco and Gómez Comendador, Víctor Fernando and
Delgado-Aguilera Jurado, Raquel and Zamarreño Suárez, María and Antulov-
Fantulin, Bruno and Arnaldo Valdés, Rosa María. “How has the concept of
air traffic complexity evolved? Review and analysis of the state of the art of
air traffic complexity”. In: Applied Sciences 14.9 (2024), pp. 1–21.
[90] Polpo, Adriano and Stern, Julio and Louzada, Francisco and Izbicki, Rafael
and Takada, Hellinton. “Bayesian inference and maximum entropy meth-
ods in science and engineering”. In: Springer Proceedings in Mathematics &
Statistics (2018).
[91] Powell, Michael JD. A direct search optimization method that models the
objective and constraint functions by linear interpolation. Springer, 1994.
[92] Powell, Michael JD. “An efficient method for finding the minimum of a func-
tion of several variables without calculating derivatives”. In: The computer
journal 7.2 (1964), pp. 155–162.
[93] Prandini, Maria and Piroddi, Luigi and Puechmorel, Stephane and Brazdilova,
Silvie Luisa. “Toward air traffic complexity assessment in new generation air
traffic management systems”. In: IEEE Transactions on Intelligent Trans-
portation Systems 12.3 (2011), pp. 809–818. issn: 1524-9050, 1558-0016.
[94] Qiao, Sun and Han, Nan and Zhu, Xin and Shu, Hong and Zheng, Jiao and
Yuan, Chang. “A dynamic trajectory prediction algorithm based on Kalman
Filter”. In: Acta Electronica Sinica 46 (2018), pp. 418–423.
134
[95] Qin, Wanting and Tang, Jun and Lao, Songyang. “DeepFR: A trajectory
prediction model based on deep feature representation”. In: Information Sci-
ences 604 (2022), pp. 226–248.
[96] Razvan, Bucuroiu and Stéphanie, Vincent. European Network Operations
Plan 2019 (2024) Rolling Seasonal Plan. Tech. rep. EUROCONTROL, 2019-
2024.
[97] Rényi, Alfréd. Probability theory. Courier Corporation, 2007.
[98] Rey, David and Rapine, Christophe and Fondacci, Remy and El Faouzi,
Nour-Eddin. “Subliminal speed control in air traffic management: Optimiza-
tion and simulation”. In: Transportation Science 50 (2016), pp. 240–262.
[99] Rivero, Cristian Rodriguez et al. “Time series forecasting using Recurrent
neural networks modified by Bayesian inference in the learning process”. In:
2019 IEEE Colombian Conference on Applications in Computational Intel-
ligence (ColCACI). Barranquilla, Colombia, 2019, pp. 1–6.
[100] Rockafellar, R Tyrrell. Convex analysis. Vol. 18. Princeton university press,
1970.
[101] Rockafellar, R Tyrrell. “Convex integral functionals and duality”. In: Con-
tributions to nonlinear functional analysis. Elsevier, 1971, pp. 215–236.
[102] Rockafellar, R Tyrrell. “Integral functionals, normal integrands and mea-
surable selections”. In: Nonlinear Operators and the Calculus of Variations.
Springer, 2006, pp. 157–207.
[103] Rockafellar, Ralph. “Integrals which are convex functionals”. In: Pacific
journal of mathematics 24.3 (1968), pp. 525–539.
[104] Rooij, Gijs de and Stienstra, Amber and Borst, Clark and Tisza, Adam B
and Paassen, Marinus M van and Mulder, Max. “Contributing factors to
flight-centric complexity in en-route air traffic control”. In: Proceedings of
the 15th USA/Europe Air Traffic Management Research and Development
Seminar. 2023.
[105] Salinas, David and Flunkert, Valentin and Gasthaus, Jan and Januschowski,
Tim. “DeepAR: Probabilistic forecasting with autoregressive recurrent net-
works”. In: International Journal of Forecasting 36.3 (2020), pp. 1181–1191.
issn: 0169-2070.
[106] Shafienya, Hesam and Regan, Amelia. “4D flight trajectory prediction based
on ADS-B data: A comparison of CNN-GRU models”. In: Aerospace Con-
ference (AERO). 2022, pp. 01–12.
135
[107] Shafienya, Hesam and Regan, Amelia C. “4D flight trajectory prediction
using a hybrid deep learning prediction method based on ADS-B technology:
A case study of Hartsfield–jackson Atlanta international airport (ATL)”. In:
Transportation Research Part C: Emerging Technologies 144 (2022), pp. 1–
12.
[108] Shang, Tongfei and Xiao, Kunpeng and Han, Kun. “Target operator tra-
jectory prediction method based on attention mechanism and LSTM”. In:
Journal of Physics: Conference Series 2037.1 (2021), pp. 1–6. issn: 1742-
6588.
[109] Shi, Xingjian and Chen, Zhourong and Wang, Hao and Yeung, Dit-Yan and
Wong, Wai-Kin and Woo, Wang-chun. “Convolutional LSTM network: A
machine learning approach for precipitation Nowcasting”. In: CoRR (2015).
[110] Shi, Xingjian and Chen, Zhourong and Wang, Hao and Yeung, Dit-Yan and
Wong, Wai-kin and Woo, Wang-chun. “Convolutional LSTM network: A
machine learning approach for precipitation nowcasting”. In: arXiv (2015).
eprint: 1506.04214 ([Link]).
[111] Shi, Zhiyuan and Xu, Min and Pan, Quan. “4D flight trajectory predic-
tion with constrained LSTM network”. In: IEEE Transactions on Intelligent
Transportation Systems 22.11 (2021), pp. 7242–7255. issn: 1524-9050.
[112] Siddique, Talha and Mahmud, Shaad and Keesee, Amy and Ngwira, Chigomezyo
and Connor, Hyunju. “A survey of uncertainty quantification in machine
learning for space weather prediction”. In: Geosciences 12.1 (2022). issn:
2076–3263.
[113] Spencer, D.A. “Applying artificial intelligence techniques to air traffic con-
trol automation”. In: The Lincoln Laboratory Journal 2.3 (1989), pp. 537–
554.
[114] Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V. “Sequence to sequence
learning with neural networks”. In: arXiv (2014). eprint: 1409.3215 ([Link]).
[115] Teboulle, Marc. “A simplified view of first order methods for optimization”.
In: Mathematical Programming 170.1 (2018), pp. 67–96.
[116] Tran, Phu N. and Nguyen, Hoang Q. V. and Pham, Duc-Thinh and Alam,
Sameer. “Aircraft trajectory prediction with enriched intent using Encoder-
Decoder architecture”. In: IEEE Access 10 (2022), pp. 17881–17896.
[117] Tsallis, Constantino and Mendes, Renio S and Plastino, Anel R. “The role of
constraints within generalized nonextensive statistics”. In: Physica A: Sta-
tistical Mechanics and its Applications 261.3-4 (1998), pp. 534–554.
136
[118] Varun S., Sudarsanan. “Deep learning framework for trajectory prediction
and in-time prognostics in the terminal airspace”. PhD thesis. Purdue Uni-
versity Graduate School, 2022.
[119] Vidosavljevic, A. and Delahaye, D. and Sunil, E. and Bussink, F. and Hoek-
stra, J. “Complexity analysis of the concepts of urban airspace design for
METROPOLIS project”. In: ENRI Int. Workshop on ATM/CNS. Tokyo,
Japan. 2015.
[120] Wan, Guihong and Maung, Crystal and Schweitzer, Haim. “Improving the
accuracy of principal component analysis by the maximum entropy method”.
In: 31st International Conference on Tools with Artificial Intelligence (IC-
TAI). IEEE. 2019, pp. 808–815.
[121] Wang, Chengzhang and He, Yu and Xu, Jun and Zhou, Chunyang. “A ma-
chine learning approach for air traffic complexity prediction and manage-
ment”. In: Journal of Advanced Transportation 2019 (2019), pp. 1–15.
[122] Wang, Huai-zhi and Li, Gang-qiang and Wang, Gui-bin and Peng, Jian-chun
and Jiang, Hui and Liu, Yi-tao. “Deep learning based ensemble approach for
probabilistic wind power forecasting”. In: Applied Energy 188 (2017), pp. 56–
70. issn: 0306-2619.
[123] Wang, Xing and Jiang, Xinhua and Chen, Lifei and Wu, Yi. “KVLMM: A
trajectory prediction method based on a variable-order Markov model with
kernel smoothing”. In: IEEE Access 6 (2018), pp. 25200–25208.
[124] Wang, Zhengyi and Delahaye, Daniel and Farges, Jean-Loup and Alam,
Sameer. “Complexity optimal air traffic assignment in multi-layer transport
network for urban air mobility operations”. In: Transportation Research Part
C: Emerging Technologies 142 (Sept. 2022), pp. 1–23. issn: 0968090X.
[125] Wang, Zhengyi and Liang, Man and Delahaye, Daniel. “Short-term 4D tra-
jectory prediction using machine learning methods”. In: SID 2017, 7th SESAR
Innovation Days. 2017.
[126] Wei, Na and Li, Xiaoyan and Lv, Xiaofei and Yang, Xiaodong. “Agent-
based simulation of air traffic flow management strategies”. In: IEEE Access
6 (2018), pp. 32216–32228.
[127] West, Jon and Zhang, Yu and Yildirim, Tolga and Blanks, Brenden. “Agent-
based simulation of air traffic control operations for small unmanned aerial
systems”. In: IEEE Access 7 (2019), pp. 56115–56125.
137
[128] Winter, Joost C de and Verhagen, Wim J and Ellerbroek, Joost. “Agent-
based simulation of collaborative decision making in air traffic flow manage-
ment”. In: IEEE Transactions on Intelligent Transportation Systems 19.9
(2018), pp. 2771–2782.
[129] Wyndemere. “An evaluation of air traffic control complexity”. In: Technical
Report, NASA 2-14284; NASA: Boulder, CO, USA (1996).
[130] Xin, Liu and Hailong, Pei and Jianqiang, Li. “Trajectory prediction based
on particle filter application in mobile robot system”. In: 2008 27th Chinese
Control Conference. 2008, pp. 389–393.
[131] Xu, Zhengfeng and Zeng, Weili and Chu, Xiao and Cao, Puwen. “Multi-
aircraft trajectory collaborative prediction based on social Long Short-Term
Memory network”. In: Aerospace 8.4 (2021). issn: 2226-4310.
[132] Yang, Hang and Liu, Zhonghua and Xiao, Yuguang. “Evaluation of air traffic
complexity using entropy-based metrics”. In: Entropy 21.10 (2019), p. 958.
[133] Yang, Yandong and Hong, Weijun and Li, Shufang. “Deep ensemble learning
based probabilistic load forecasting in smart grids”. In: Energy 189 (2019),
pp. 1–12.
[134] Zalinescu, Constantin. Convex analysis in general vector spaces. World sci-
entific, 2002.
[135] Zeng, Weili and Chu, Xiao and Xu, Zhengfeng and Liu, Yan and Quan,
Zhibin. “Aircraft 4D trajectory prediction in civil aviation: A review”. In:
Aerospace 9.2 (2022). issn: 2226-4310.
[136] Zeng, Weili and Chu, Xiao and Xu, Zhengfeng and Liu, Yan and Quan,
Zhibin. “Aircraft 4D trajectory prediction in civil aviation: A review”. In:
Aerospace 9.2 (Feb. 2022), pp. 1–19. issn: 2226-4310.
[137] Zhang, Rongjie and Wu, Wenjuan and Li, Liang and Li, Zhaowei. “Machine
learning techniques for air traffic complexity evaluation”. In: Transportation
Research Part C: Emerging Technologies 111 (2020), pp. 463–479.
[138] Zhang, Tao and Gao, Yang and Zhang, Chengwei. “Short-term 4D trajectory
prediction based on KF joint EKF parameter identification”. In: Journal of
Civil Aviation University of China 34.5 (2016), pp. 1–4.
[139] Zhang, Xiaoge and Mahadevan, Sankaran. “Bayesian neural networks for
flight trajectory prediction and safety assessment”. In: Decision Support Sys-
tems 131 (Apr. 2020), p. 113246. issn: 01679236.
138
Appendix
A Computing conjugates
In this appendix, we give computation details for the conjugate functions
involved in the Maximum Entropy problem of Chapter 3. We start with the
conjugate of
α
g◦ (η◦ , η) = − ∥z − η∥2Σ−1 − δ{1} (η◦ ).
2
We have:
(g◦ )⋆ (λ◦ , λ)
α
= ηinf,η ⟨(λ◦ , λ), (η◦ , η)⟩ + ∥z − η∥2Σ−1 + δ{1} (η◦ )
◦ 2
α
n o
= inf λ◦ η◦ + δ{1} (η◦ ) + inf ⟨λ, η⟩ + ∥z − η∥Σ−1 . 2
η◦ η 2
Clearly, the first infimum is attained at η̄◦ = 1, yielding the value λ◦ . The
second infimum is attained at the point η̄ where the gradient of the function
η 7→ ⟨λ, η⟩ + α2 ∥z − η∥2Σ−1 . A straightforward computation shows that the
gradient of the latter function is equal to λ − αΣ−1 (z − η), which implies
that
1
η̄ = z − Σλ.
α
It follows that
α
inf ⟨λ, η⟩ + ∥z − η∥2Σ−1
η 2
1 α 1 1
= λ, z − Σλ + Σλ, Σ−1 Σλ
α 2 α α
1
= ⟨λ, z⟩ − ⟨λ, Σλ⟩ .
2α
Putting things together, we obtain:
1
(g◦ )⋆ (λ◦ , λ) = λ◦ + ⟨λ, z⟩ − ⟨λ, Σλ⟩ .
2α
Next, we address the case where the estimated covariance matrix is sin-
gular. In this case, the inverse covariance matrix does not exist and the
Mahalanobis distance degenerates. We check here that the appropriate for-
mula is that given in Equation (3.6).
Proof. By definition,
1
φ (ξ) = sup ⟨ξ, x⟩ − ⟨ξ, Qx⟩ .
⋆
x 2
The gradient of the function to be supremized is equal to ξ−Qx. If ξ ∈ ran Q,
then the gradient vanishes at any x̄ such that Qx̄ = ξ, for example at
x̄ = Q† ξ, since QQ† ξ = ξ (recall that QQ† is the orthonormal projection
onto the range of Q. Then,
D E 1D † E 1D E
φ⋆ (ξ) = ξ, Q† ξ − Q ξ, QQ† ξ = ξ, Q† ξ .
2 2
If ξ ∈
/ ran Q, then pick x◦ ∈ ker Q \ {0}. For every t ∈ R,
1
⟨ξ, tx◦ ⟩ − ⟨ξ, Q(tx◦ )⟩ = t ⟨ξ, x◦ ⟩ . (6.1)
2
Since x◦ ∈ ker Q = (ran Q⋆ )⊥ = (ran Q)⊥ and we assumed that ξ ∈ / ran Q,
the scalar product ⟨ξ, x◦ ⟩ is necessarily nonzero, which shows that the left
hand side in Equation (6.1) can be made as large as desired. This implies
that φ⋆ (ξ) = ∞ in this case.
By definition,
h⋆ (τ ) = sup{τ t − h(t)}.
t∈R
τ − h′ (t̄) = τ − ln t̄ − 1 = 0,
h⋆ (τ ) = τ t̄ − h(t̄) = τ eτ −1 − eτ −1 (τ − 1) = eτ −1 .