0% ont trouvé ce document utile (0 vote)
43 vues162 pages

2024TLSES121

Cette thèse explore la gestion du trafic aérien en se concentrant sur la prédiction des trajectoires de vol et l'estimation de la complexité du trafic à l'aide de réseaux de neurones et du principe d'entropie maximale. Un modèle innovant, BayesCoordLSTM, est proposé pour améliorer la précision des prévisions de trajectoire, tandis que des cartes de complexité du trafic sont générées pour évaluer la congestion et optimiser la gestion du trafic aérien. Les résultats démontrent une supériorité significative du modèle proposé par rapport aux méthodes existantes.

Transféré par

David Deloustal
Copyright
© © All Rights Reserved
Nous prenons très au sérieux les droits relatifs au contenu. Si vous pensez qu’il s’agit de votre contenu, signalez une atteinte au droit d’auteur ici.
Formats disponibles
Téléchargez aux formats PDF, TXT ou lisez en ligne sur Scribd
0% ont trouvé ce document utile (0 vote)
43 vues162 pages

2024TLSES121

Cette thèse explore la gestion du trafic aérien en se concentrant sur la prédiction des trajectoires de vol et l'estimation de la complexité du trafic à l'aide de réseaux de neurones et du principe d'entropie maximale. Un modèle innovant, BayesCoordLSTM, est proposé pour améliorer la précision des prévisions de trajectoire, tandis que des cartes de complexité du trafic sont générées pour évaluer la congestion et optimiser la gestion du trafic aérien. Les résultats démontrent une supériorité significative du modèle proposé par rapport aux méthodes existantes.

Transféré par

David Deloustal
Copyright
© © All Rights Reserved
Nous prenons très au sérieux les droits relatifs au contenu. Si vous pensez qu’il s’agit de votre contenu, signalez une atteinte au droit d’auteur ici.
Formats disponibles
Téléchargez aux formats PDF, TXT ou lisez en ligne sur Scribd

Contributions à la gestion du trafic aérien : prédiction

des trajectoires par réseaux de neurones et cartes de


complexité par maximum d’entropie
Thi Lich Nghiem

To cite this version:


Thi Lich Nghiem. Contributions à la gestion du trafic aérien : prédiction des trajectoires par réseaux
de neurones et cartes de complexité par maximum d’entropie. Réseaux et télécommunications [[Link]].
Université de Toulouse, 2024. Français. �NNT : 2024TLSES121�. �tel-04876849�

HAL Id: tel-04876849


[Link]
Submitted on 9 Jan 2025

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
Doctorat de
l’Université de Toulouse
préparé à l'Université Toulouse III - Paul Sabatier

Contributions à la gestion du trafic aérien: prédiction des


trajectoires par réseaux de neurones et cartes de complexité
par maximum d'entropie.

Thèse présentée et soutenue, le 16 septembre 2024 par


Thi-Lich NGHIEM
École doctorale
EDMITT - Ecole Doctorale Mathématiques, Informatique et Télécommunications de Toulouse
Spécialité
Informatique et Télécommunications
Unité de recherche
IMT : Institut de Mathématiques de Toulouse
Thèse dirigée par
Pierre MARÉCHAL et Thi-Lan LE

Composition du jury
M. Eric Marie FERON, Président, King Abdullah University of Science and Technology
M. Duc-Dung NGUYEN, Rapporteur, Vietnam Academy of Science and Technology
Mme Valérie GIRARDIN, Rapporteure, Université Caen Normandie
M. Denis KOUAMÉ, Examinateur, Université Toulouse III - Paul Sabatier
M. Pierre MARÉCHAL, Directeur de thèse, Université Toulouse III - Paul Sabatier
Mme Thi-Lan LE, Co-directrice de thèse, Hanoi University of Science and Technology

Membres invités
M. Daniel Delahaye, Ecole Nationale de l'Aviation Civile
THÈSE
En vue de l’obtention du

DOCTORAT DE L’UNIVERSITÉ DE TOULOUSE


Délivré par : l’Université Toulouse 3 Paul Sabatier (UT3 Paul Sabatier)

Présentée et soutenue le 16/09/2024 par :


Thi Lich NGHIEM
Contributions à la gestion du trafic aérien: prédiction des
trajectoires par réseaux de neurones et cartes de complexité
par maximum d’entropie.

JURY
Valérie Girardin Professeure des Universités Rapporteur
Duc-Dung Nguyen Maı̂tre de Conférences Rapporteur
Eric Marie Feron Professeur des Universités Examinateur
Denis Kouamé Professeur des Universités Examinateur
Daniel Delahaye ENAC Membre invité
Pierre Maréchal Professeur des Universités Directeur de Thèse
Thi-Lan Le Maı̂tresse de Conférences Codirectrice de Thèse

École doctorale et spécialité :


MITT : Informatique et Telecommunications
Unité de Recherche :
Laboratoire de Recherche ENAC
Directeur(s) de Thèse :
Pierre Maréchal et Thi-Lan Le
Rapporteurs :
Valérie Girardin et Duc-Dung Nguyen
Résumé
Dans le contexte de la gestion du trafic aérien (ATM), la demande croissante
d’espace aérien a exercé une pression considérable sur les systèmes existants.
Cette demande croissante nécessite l’élaboration et la mise en œuvre urgentes
de mesures de sécurité renforcées et d’une efficacité optimisée de l’espace
aérien pour faire face à l’augmentation des volumes de trafic. La complexité
du trafic aérien est déterminée par le flux du trafic en temps réel dans un
espace aérien donné, y compris le nombre de vols et leurs positions relatives.
En raison des contraintes liées à la conception de l’espace aérien, les vols
doivent modifier leurs trajectoires pour s’adapter à ces limites, ce qui entraı̂ne
des changements continus dans la complexité du trafic. En conséquence,
mesurer avec précision la complexité dynamique du trafic aérien est une
tâche difficile. Cette thèse vise à explorer les difficultés liées à la prédiction de
trajectoires de vol et à l’estimation de la complexité du trafic aérien en tirant
parti des méthodologies d’apprentissage automatique, du principe d’entropie
maximale et de la géométrie de l’information.
La prévision des trajectoires de vol est essentielle dans l’aviation, car
elle favorise une gestion efficace du trafic aérien et garantit des opérations
aériennes sûres et fluides. Les méthodes de prévision traditionnelles ne parvi-
ennent souvent pas à saisir les dépendances spatio-temporelles complexes et
les incertitudes spécifiques au secteur de l’aviation. Nous proposons un nou-
veau modèle nommé modèle BayesCoordLSTM. Ce modèle hybride intègre la
transformation de coordonnées et le méthode Bayésien dans les modèles Con-
vLSTM. En combinant les caractéristiques spatiales extraites par le réseau
neuronal convolutif (CNN) avec les dépendances temporelles identifiées par la
mémoire à long terme (LSTM), le modèle améliore considérablement la pré-
cision de la prédiction de trajectoire. L’incorporation du méthode bayésien
offre des prévisions de trajectoire probabilistes avec des niveaux de confi-
ance, tandis que la transformation des coordonnées améliore la conscience
spatiale et les performances prédictives. Nous utilisons un ensemble de don-
nées de vol réel pour la mise en œuvre. Nos résultats démontrent la remar-
quable supériorité du modèle BayesCoordLSTM sur les méthodes existantes.

i
Dans divers scénarios et conditions de vol, ce modèle surpasse systématique-
ment les modèles de base en termes de précision de prédiction de trajectoire.
L’utilisation de l’optimisation bayésienne rationalise non seulement le réglage
des hyperparamètres, mais donne également des résultats prometteurs en ter-
mes d’amélioration des performances prédictives.
De plus, dans l’estimation de la complexité du trafic aérien, la recon-
struction des fonctions de densité de probabilité angulaire est cruciale car
elle permet de modéliser et d’évaluer la répartition spatiale du flux de trafic
aérien. Cette approche permet d’évaluer la congestion de l’espace aérien et
de prévoir des schémas de trafic complexes, facilitant ainsi des stratégies
de gestion du trafic aérien plus efficaces. Dans cette thèse, nous proposons
d’utiliser le principe d’entropie maximale pour générer des cartes détaillées
de complexité du trafic à partir de données de trajectoires, en exploitant les
propriétés statistiques des coefficients de Fourier. La méthode est validée
par des simulations numériques, démontrant sa capacité à représenter avec
précision des distributions angulaires de scénarios simples à complexes.
À l’aide de données historiques sur les vols et le trafic aérien, la re-
construction des fonctions de densité de probabilité angulaire par maximum
d’entropie est proposée, puis utilisée pour construire des cartes de complexité
du trafic aérien. La géométrie de l’information est à la base de l’évaluation de
la complexité locale. En discrétisant l’espace aérien et en appliquant la du-
alité de Fenchelpour reconstruire les densités angulaires, nous obtenons une
densité locale dans la classe de densité de Von Mises Généralisées. L’indice de
complexité local est alors calculés sur la base des distances géodésiques entre
la distribution obtenue et la distribution uniforme (qui est bien sûr associée
à la plus grande compliexité). L’utilisation de la divergence Kullback-Leibler
symétrisée garantit l’efficacité des calculs, permettant une application pra-
tique dans les systèmes ATM. Ces cartes de complexité fournissent des in-
formations essentielles sur les modèles de trafic et les réponses systémiques,
soutenant la gestion stratégique.
En conclusion, cette thèse propose des approches efficaces pour résoudre
certains problèmes en ATM, notamment en matière de prédiction de trajec-
toire de vol et d’évaluation de la complexité du trafic aérien.
Mots clés: Gestion du trafic aérien, Complexité du trafic aérien, Réseaux
de neuronnes, Prédiction des trajectoire, Géométrie de l’information, En-
tropie maximale.

ii
Abstract
In the context of air traffic management (ATM), the increasing demand for
airspace has exerted substantial pressure on existing systems. This grow-
ing demand necessitates the urgent development and implementation of en-
hanced safety measures and optimized airspace efficiency to accommodate
escalating traffic volumes. The complexity of air traffic is determined by
the real-time flow of traffic within a given airspace, including the number of
flight and their relative positions. Due to the constraints of airspace design,
flight must alter their trajectories to fit within these boundaries, resulting
in ongoing changes in traffic complexity. As a result, precisely measuring
the dynamic complexity of air traffic is a challenging task. This thesis aims
to explore the difficulties in flight trajectory prediction and air traffic com-
plexity estimation by leveraging machine learning methodologies, Maximum
Entropy principle and information geometry.
Predicting flight trajectories is essential in aviation, as it supports effi-
cient air traffic management and ensures safe and smooth flight operations.
Traditional prediction methods often fall short in capturing the complex
spatio-temporal dependencies and uncertainties specific to the aviation sec-
tor. We propose a novel model named BayesCoordLSTM model. This hybrid
model integrates coordinate transformation and Bayesian method into Con-
vLSTM models. By combining the spatial features extracted by the Con-
volutional Neural Network (CNN) with the temporal dependencies identi-
fied by Long Short-Term Memory (LSTM), the model significantly enhances
trajectory prediction accuracy. The incorporation of Bayesian method of-
fers probabilistic trajectory forecasts with confidence levels, while coordinate
transformation improves spatial awareness and predictive performance. We
use real flight dataset to implement, our results demonstrate the remarkable
superiority of the BayesCoordLSTM model over existing methods. Across
various flight scenarios and conditions, this model consistently outperforms
baseline models in terms of trajectory prediction accuracy. The utilization of
Bayesian optimization not only streamlines hyperparameter tuning but also
yields promising results in terms of predictive performance enhancement.

iii
In addition, in air traffic complexity estimation, reconstructing angu-
lar probability density functions is crucial as it helps model and assess the
spatial distribution of air traffic flow. This approach helps assess airspace
congestion and predict complex traffic patterns, facilitating more effective
air traffic management strategies. In this dissertation, we propose to use the
Maximum Entropy principle generating detailed traffic complexity maps from
trajectory data, leveraging the statistical properties of Fourier coefficients.
The method is validated through numerical simulations, demonstrating its
ability to accurately represent angular distributions from simple to complex
scenarios.
Using historical flight and airspace traffic data, reconstructing angular
probability density functions improves predictions of congestion areas and
complexity levels. And then, we illustrate a novel methodology for con-
structing air traffic complexity maps is developed using information geom-
etry. By discretizing the airspace and applying the Generalized von Mises
distribution (GvM) alongside Fenchel duality, local traffic complexity indices
are calculated based on geodesic distances between distributions. The use of
symmetrized Kullback-Leibler divergence ensures computational efficiency,
enabling practical application in ATM systems. These complexity maps pro-
vide critical insights into traffic patterns and systemic responses, supporting
strategic management.
In conclusion, this dissertation propose efficient approaches to solve some
problems in ATM, especially in flight trajectory prediction, and air traffic
complexity assessment.

Keywords: Air traffic management, Air traffic complexity, Neural net-


works, Prediction of trajectories, Information geometry, Maximum Entropy.

iv
Acknowledgments
I would like to express my deepest and most heartfelt gratitude to those who
have accompanied and supported me throughout the challenging journey of
completing my PhD. This process has been an experience filled with both
intellectual and emotional trials, and during the most difficult moments, I
have been fortunate to receive unwavering encouragement and support from
many people.
First and foremost, I extend my profound thanks to my advisors, Prof.
Pierre Maréchal and Assoc. Prof. Thi-Lan Le. They have been more than
supervisors; they have been true mentors and steadfast companions on this
journey. Their guidance, encouragement, and belief in my work helped me
navigate every obstacle and bring this thesis to completion.
I am deeply grateful to my referees, Prof. Valérie Girardin and As-
soc. Prof. Duc-Dung Nguyen. Their thorough evaluation of my work, their
insightful critique, and their detailed, thoughtful feedback have had an im-
mense impact not only on this thesis but on shaping my future work. Their
contributions have been invaluable, and I am deeply thankful for the time
and effort they devoted to this process.
I also sincerely appreciate the other esteemed members of my thesis jury,
Prof. Eric Marie Feron, Prof. Denis Kouamé, and Prof. Daniel Delahaye,
for their time, insightful comments, and constructive suggestions, which have
enriched my work and inspired new ideas for future endeavors.
Special thanks to Prof. Stéphane Puechmorel and Assoc. Prof. Andrija
Vidosavljevic for their invaluable support and mentorship. I would also like
to thank my friends in Z108/Z109, who have shared countless memorable
moments over coffee and lunch breaks, bringing joy and balance to my days.
I am deeply thankful to ENAC (École Nationale de l’Aviation Civile),
University of Toulouse III - Paul Sabatier and HUST (Hanoi University of
Science and Technology) for providing exceptional research facilities, and
particularly to ENAC for the financial support that made this work possible.

v
I extend my appreciation to all members of the Computer Vision De-
partment, MICA Research Institute, HUST for their willingness to assist and
offer guidance whenever needed.
I would also like to thank my Vietnamese friends in France: Thinh, Nga,
Hung and Lam, who stood by me during the intense pressures of thesis com-
pletion. My heartfelt thanks go to my colleagues at Thuongmai University
for their support.
A special note of gratitude goes to the families of Christine, Anne, and
An. Their kindness and hospitality made me feel like a part of their families,
providing warmth and comfort during some of the most difficult times.
Lastly, but most importantly, I express my deepest love and appreciation
to my parents, my husband Khiem, and my children Minh and Ngoc. Their
unconditional love, patience, and belief in me have been the foundation upon
which I have built this achievement. I am also grateful to my sisters for their
constant understanding and companionship on this long journey.
Each of these people has played an irreplaceable role in my life, and I
will forever cherish their support and kindness.

vi
Contents

Résumé i

Abstract iii

Acknowledgments v

Abbreviations xv

Publications xvii

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Air traffic complexity . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Factors contributing to air traffic complexity . . . . . . 11
1.3.2 Air traffic complexity metrics . . . . . . . . . . . . . . 12
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . 18

2 Applying the Bayesian method to improve flight trajectory


prediction 21
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

vii
2.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . 30
2.3.1 LSTM architecture . . . . . . . . . . . . . . . . . . . . 30
2.3.2 LSTM operation . . . . . . . . . . . . . . . . . . . . . 31
2.4 Proposed model . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Coordinate transformation . . . . . . . . . . . . . . . . 34
2.4.2 CNN layer . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.3 Bayesian optimization . . . . . . . . . . . . . . . . . . 36
2.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.1 Data preparation . . . . . . . . . . . . . . . . . . . . . 39
2.5.2 Hyperparameter tuning . . . . . . . . . . . . . . . . . . 42
2.5.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . 42
2.5.4 Comparison of experimental results . . . . . . . . . . . 43
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3 Reconstructing angular probability density by using Max-


imum Entropy 49
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Computing moments . . . . . . . . . . . . . . . . . . . 55
3.3.2 Formulating the Maximum Entropy problem . . . . . . 56
3.4 Duality in action . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4.1 Review of useful convex analysis . . . . . . . . . . . . . 59
3.4.2 Computing Maximum Entropy densities . . . . . . . . 64
3.4.3 Deriving an algorithm . . . . . . . . . . . . . . . . . . 66
3.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

viii
3.5.1 Validation with Dirac distributions . . . . . . . . . . . 70
3.5.2 Numerical implementations . . . . . . . . . . . . . . . 72
3.6 Towards air traffic complexity estimation . . . . . . . . . . . . 79
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4 Building complexity maps of airspace via information ge-


ometry 83
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2 Deriving a complexity index via information geometry . . . . . 85
4.2.1 Maximum Entropy angular densities . . . . . . . . . . 86
4.2.2 The Fisher information metric . . . . . . . . . . . . . . 89
4.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 99
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5 Maximum Entropy multivariate density estimation 109


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Statement of the problem . . . . . . . . . . . . . . . . . . . . 110
5.3 Maximum Entropy solutions . . . . . . . . . . . . . . . . . . . 113
5.4 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6 Conclusion and further work 123


6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

ix
6.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

References 138

Appendix
A Computing conjugates . . . . . . . . . . . . . . . . . .

x
List of Figures

1.1 One-day traffic density of trajectories across Europe [96]. . . . 4


1.2 Some proposed approaches to evaluate the air traffic complexity. 8
1.3 An example of three traffic scenarios classified in order of in-
creasing complexity [38]. . . . . . . . . . . . . . . . . . . . . . 14

2.1 Different stages of a flight [135]. . . . . . . . . . . . . . . . . . 25


2.2 LSTM architecture [56]. . . . . . . . . . . . . . . . . . . . . . 30
2.3 The BayesCoordLSTM framework to predict flight trajectory. 33
2.4 Optimizing hyperparameters using the Bayesian method. . . . 37
2.5 An example of addressing missing data before (left) and after
(right) using cubic spline interpolation. . . . . . . . . . . . . . 41
2.6 Sliding time window. . . . . . . . . . . . . . . . . . . . . . . . 41
2.7 A comparison of three configurations of the CNN-LSTM model. 46

3.1 An example map of trajectories through a selected cell and


the angular histogram of the selected cell. . . . . . . . . . . . 55
3.2 Kullback–Leibler relative entropy versus highest frequency. . . 68
3.3 Simulate angular samples based on a original probability p◦ . . 71
3.4 Gradient norm when N = 40. . . . . . . . . . . . . . . . . . . 72
3.5 Residual values when α values from 1 to 10000. . . . . . . . . 73

xi
3.6 Results on the Original and Optimal densities with two peaks. 74
3.7 Results on the Original and Optimal densities with five peaks. 75
3.8 Comparison of computational time by seven optimal models
with five peaks. . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.1 Example of a cell around a grid point in the airspace, with


trajectories across ATL on December 26, 2018. . . . . . . . . . 87
4.2 The GvMN with different β values and peak numbers: Top
to bottom, β = 0.01 and β = 0.3; Left: 1 peak, Right: 20 peaks. 99
4.3 Traffic density of trajectories across ATL on December 26, 2018.101
4.4 All flight trajectories across ATL on December 26, 2018. . . . 102
4.5 Complexity maps on ATL airport dataset. . . . . . . . . . . . 103
4.6 High complexity cell: many flights, diverse directions. . . . . . 105
4.7 Low complexity cell: few flights, few directions. . . . . . . . . 105
4.8 Medium complexity cell: moderate flights, moderate directions. 106
4.9 High complexity cell: fewer flights, diverse directions. . . . . . 106

5.1 Plots of εψβ (η) and its conjugate for β = 1 and various values
of ε. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

xii
List of Tables

2.1 4D Trajectory data features . . . . . . . . . . . . . . . . . . . 39


2.2 An example of 4D Flight trajectory information. . . . . . . . . 40
2.3 Basic parameters of proposed model. . . . . . . . . . . . . . . 43
2.4 Comparison with state-of-the-art methods. . . . . . . . . . . . 44

4.1 The simulation results . . . . . . . . . . . . . . . . . . . . . . 98

xiii
xiv
Abbreviations
Abbreviation Meaning
ADS-B Automatic Dependent Surveillance Broadcast
ATCOs Air Traffic Controllers
ATFM Air Traffic Flow Management
ATL ATLanta International Airport
ATM Air Traffic Management
BCNN Bayesian Convolutional Neural Network
BFGS Broyden Fletcher Goldfarb Shann
BNN Bayesian Neural Network
CCI Composite Complexity Index
CG Conjugate Gradient
CG3D CNN-GRU with a 3Dimensional Convolution
CNN Convolutional Neural Network
COBYLA Constrained Optimization BY Linear Approximations
DNN Deep Neural Network
FIM Fisher Information Metric
GvM Generalized von Mises
GRU Gated Recurrent Unit
LSTM Long Short- Term Memory
MAE Mean Absolute Error
NEXTGEN NEXT GENeration Air Transportation System
PDFs Probability Density Functions
QC Qualification of Constraints
ReLU Rectified Linear Unit
RMSE Root Mean Square Error
RNN Recurrent Neural Network
SESAR Single European Sky ATM Research
SKL Symmetrized Kullback-Leible
TNC Truncated Newton Conjugate-Gradient
Trust-Constr Trust Region Constraint

xv
xvi
Publications
1 Thi-Lich Nghiem, Viet-Duc Le, Thi-Lan Le, “Quantitative evalua-
tion of robustness of Bayesian CNN models for image classification”,
National Conference on Some selected issues of information and com-
munication technology, Thai Nguyen, Vietnam, 2021, pp. 454-460,
ISBN 978-604-67-1744-7.

2 Thi-Lich Nghiem, Viet-Duc Le, Thi-Lan Le, Pierre Maréchal, Daniel


Delahaye, Andrija Vidosavljevic, “Applying Bayesian inference in a hy-
brid CNN-LSTM model for time-series prediction”, International Con-
ference on Multimedia Analysis and Pattern Recognition (MAPR), Phu
Quoc, Vietnam, 2022, pp. 1-6, DOI: 10.1109/MAPR56351.2022.9924783.

3 Thi-Lich Nghiem, Viet-Duc Le, Thi-Lan Le, Pierre Maréchal, Daniel


Delahaye, “Improving flight trajectory predictions with Bayesian-optimized
ConvLSTM models,” International Conference on Intelligent Systems
and Networks (ICISN), Hanoi, Vietnam, 2024, pp. 604-614, DOI:
10.1007/978-981-97-5504-2_70.

4 Thi-Lich Nghiem, Viet-Duc Le, Thi-Lan Le, Daniel Delahaye, Pierre


Maréchal, “Angular probability density reconstruction by maximum
entropy - Towards air traffic complexity estimation,” submitted, 2024.

5 Pierre Maréchal, Thi-Lich Nghiem, “A note on maximum entropy


multivariate density estimation,” submitted, 2024.

6 Pierre Maréchal, Thi-Lich Nghiem, Stéphane Puechmorel, “Build-


ing complexity maps of airspace via information geometry,” submitted,
2024.

xvii
Chapter 1

Introduction
This chapter illustrates a comprehensive exploration of air traffic complexity,
delving into the intricate operational dynamics of air traffic management. It
includes a detailed analysis of the factors that contribute to the heightened
complexity of air traffic, as well as an intensive discussion of the metrics used
to assess this complexity. This discussion is the foundational motivation for
our study, which is presented together with the objectives and scope outlined
in the dissertation. The significance of our contributions is briefly addressed
here, with a more detailed examination to follow in subsequent chapters.
Additionally, we provide an overview of the thesis structure to clarify its
organization and progression.

1.1 Motivation

In the context of air traffic management (ATM), the growing demand for
airspace has placed unprecedented pressure on existing systems. This expo-
nential growth in airspace demands not only highlights the critical need for
enhanced safety measures but also requires efficient optimization of airspace
to accommodate increasing air traffic volumes.
Air traffic management faces numerous complex challenges that impact
operational efficiency and safety. The structure of current airspace configura-
tions, characterized by rigid layouts and fixed sector divisions, significantly
contributes to congestion and inefficiencies. This complexity encompasses
both physical aspects, such as airport layouts and airway structures, and
operational dynamics involving the intricate interactions of aircraft within
these frameworks [68].

1
Human factors further exacerbate these challenges, as air traffic con-
trollers (ATCOs) manage traffic within designated sectors while grappling
with cognitive limitations and the increasing complexity of traffic patterns [36].
The uneven distribution of traffic and the inflexible configuration of airspace
sectors create congestion hotspots, necessitating strategic management strate-
gies to optimize airspace usage and mitigate safety risks [63].
Within the broader context of air traffic complexity, accurate prediction
of aircraft trajectories is crucial. ATM systems must ensure precise trajec-
tory forecasts to maintain safe separation standards and prevent potential
collisions [98]. However, trajectory prediction is affected by inherent uncer-
tainties, particularly influenced by dynamic meteorological conditions such
as wind. These uncertainties arise from incomplete data and model inaccura-
cies in numerical weather prediction, which affect the velocity and precision
of trajectory forecasts [40].
The complexity of trajectory prediction is further compounded by the
nonlinear dynamics of atmospheric interactions, making it challenging to
reliably predict aircraft paths under varying conditions. Existing conflict
resolution methodologies often struggle to adequately account for these un-
certainties, risking violations of safety standards during real-time operations.
Addressing uncertainties in trajectory prediction is essential for enhanc-
ing conflict resolution methodologies and ensuring robust air traffic man-
agement. By developing methods that accurately estimate and incorporate
uncertainty into trajectory forecasts, more resilient and adaptive solutions
can be implemented to maintain safety and efficiency in increasingly con-
gested airspace environments.
The complexities of ATM systems demand innovative methodologies to
effectively address the challenges posed by escalating air traffic congestion.
Recognizing the critical role of accurate air traffic complexity analysis in
enhancing operational efficiency and safety, this thesis endeavors to introduce
novel approaches for analyzing and managing air traffic complexity.
Addressing the uncertainties inherent in ATM is imperative for devel-
oping robust and adaptive solutions that ensure safe and efficient air traffic
operations. By accurately assessing air traffic complexity, it becomes possible
to identify congestion hotspots, optimize airspace utilization, and mitigate
potential safety risks.

2
Through empirical analysis and theoretical inquiry, this study seeks to
provide insights into the complexities of ATM and inform the development
of strategies to address operational challenges in ATM systems. By offering
innovative methodologies for analyzing air traffic complexity, this thesis aims
to contribute to the advancement of ATM practices, ensuring the safety and
efficiency of air travel in increasingly congested airspace environments.

1.2 Background

Air traffic management (ATM) is a critical system that ensures the safety and
efficiency of air travel globally. ATM encompasses a wide range of services
including air traffic control, which manages the movement of aircraft on the
ground and in the air; air traffic flow management (ATFM), which regulates
the flow of aircraft to prevent congestion; and ATM, which involves the
strategic organization of airspace to optimize its use [40]. Together, these
components work to maintain safe separation between aircraft and manage
the complex dynamics of air traffic.
The growth in air traffic has led to increased congestion and complexity
within airspace systems. This is particularly evident in regions with dense
air traffic, such as Europe. The dense network of flight paths in these areas
highlights the challenges faced by modern ATM systems in managing high
volumes of traffic. Such congestion necessitates the implementation of ad-
vanced methodologies to ensure aircraft maintain safe separation distances
and to prevent conflicts.
Figure 1.1 illustrates the one-day traffic density of trajectories across Eu-
rope on two specific years: 2019 (see Figure 1.1a) and 2024 (see Figure 1.1b).
A comparative analysis of these images reveals a significant increase in traf-
fic density over the five-year period from 2019 to 2024, demonstrating the
increased complexity and congestion within European airspace.
Effective trajectory prediction is essential for managing air traffic and
preventing collisions. Predicting the future positions of aircraft allows air
traffic control to anticipate potential conflicts and take preemptive measures.
However, trajectory prediction is inherently complex due to various sources of
uncertainty, particularly those related to meteorological conditions. Winds
and other atmospheric factors can significantly affect aircraft velocity and

3
(a) 2019 (b) 2024

Figure 1.1: One-day traffic density of trajectories across Europe [96].

direction, leading to deviations from planned paths [98]. Understanding and


mitigating these uncertainties are crucial for accurate trajectory prediction.
Meteorological conditions, especially winds, are a major source of uncertainty
in trajectory prediction. Numerical weather prediction models often suffer
from inaccuracies due to incomplete atmospheric data and model limitations.
These uncertainties are further amplified by the nonlinear and chaotic na-
ture of atmospheric dynamics, making simple statistical methods inadequate
for accurate representation [40]. This can lead to errors in predicting the
velocity and trajectory of aircraft, complicating the task of maintaining safe
separation.
Numerous methods have been developed to resolve potential conflicts
between aircraft. These approaches generally involve predictive modeling
to forecast aircraft positions and algorithms to suggest corrective actions.
However, many traditional methods do not fully account for the dynamic
uncertainties inherent in trajectory predictions, resulting in solutions that
may be sensitive to perturbations and increasing the risk of separation stan-
dard violations [98].
Air traffic complexity refers to the intricate interactions and dynamics
of aircraft within airspace. It includes factors such as the number and types
of aircraft, their velocities, trajectories, and the various environmental and
operational conditions that affect their movements. In contrast, airspace
complexity involves the structural aspects such as the number and location
of airports, airways, and air traffic control sectors [36]. Both forms of com-
plexity are interrelated and must be managed effectively to ensure efficient
and safe airspace operations. In this study, we focus how to estimate the air
traffic complexity.

4
Air traffic complexity is a multifaceted concept crucial for ATM, in-
volving various interdependent parameters. Despite extensive research, no
consensus on a single definition exists due to the numerous and highly cor-
related factors involved [89]. These parameters include airspace structure,
weather conditions, and technological factors. Complexity assessment tra-
ditionally relies on the subjective evaluations of air traffic controllers, but
there is a shift towards objective indicators based on various air traffic pa-
rameters [36, 89]. Key indicators include Aircraft Density, Dynamic Den-
sity, Interval Complexity, Fractal Dimension, Input/Output Approach, and
Lapunov Exponents, each with distinct strengths and limitations [68].
Traditional airspace structures are often rigid and lack the flexibility
to adapt to the dynamic demands of modern air traffic. This rigidity can
lead to inefficiencies and potential safety hazards, particularly in high-density
traffic areas. ATCOs manage traffic within these structures, but their ability
to handle large volumes of traffic is limited by cognitive constraints and
workload [68]. This highlights the need for more adaptable and responsive
ATM strategies.
Complexity maps have emerged as valuable tools for visualizing and
managing air traffic complexity. These maps provide detailed insights into
the operational dynamics within airspace, helping to identify areas of high
traffic density and potential conflicts [93]. By analyzing complexity maps,
ATCOs and other stakeholders can make more informed decisions about
aircraft routing and sector management, potentially reducing conflict risks
and improving operational efficiency.
Recent research has focused on developing advanced metrics and method-
ologies for evaluating air traffic complexity. For example, geodesic metrics
and other mathematical models have been used to quantify the level of dis-
order in aircraft trajectories. While various approaches are proposed to deal
with the air traffic complexity, only a few papers manage air traffic and re-
duce the risk of conflicts between aircraft. In 2009, Lee et al. [68] proposed
the strategy for optimizing the entry points of aircraft before reaching the
sector boundary based on a complexity map. They showed that complex-
ity map can provide a visual representation of how a system responds to
disturbances and can be used to identify key drivers of complexity and in-
form decision-making. In the context of ATM, complexity map can provide
valuable insights into the behavior of the system under different traffic con-

5
ditions (such as the entry point angle and heading of the entering aircraft)
and environmental factors (like convective weather), and help identify areas
where improvements can be made. By analyzing the complexity map, the
air traffic control system can identify areas of high traffic density or other
factors that may increase the risk of conflicts between aircraft [93]. In this re-
search, complexity maps are considered as an air traffic evaluation approach
which illustrate how to manage the sector’s conflicts based on the entering
aircraft. Unlike other approaches, Hong et al. proposed a new method called
Variance Quadtree-Based Map Building to manage the conflict by determin-
ing the best strategy when adding an aircraft into a sector. This can help
improve the safety and efficiency of air traffic control operations, as well as
reduce delays and improve overall airspace capacity [57]. In 2020, Juntama et
al. [63] used König metric to build a complexity map to reduce and optimize
the air traffic for strategic 4D trajectory. Delahaye et al. [36] quantify air
traffic complexity of a set of aircraft trajectories by approximating a metric
based on linear dynamical system which can fit a vector field as closely as
possible to the observed aircraft positions and speeds. Therefore, this system
can evaluate the local disorder of a set of trajectories in the neighborhood of
a given aircraft at a given time based on the eigenvalues of the matrix A as-
sociated with organized traffic situations, including the vertical strip around
the imaginary axis. It provides a nuanced understanding of how aircraft
interact within congested airspace.
This background provides a comprehensive overview of the key chal-
lenges and considerations in ATM. The increasing demand for airspace, the
complexities of trajectory prediction, and the rigid structure of traditional
ATM systems underscore the need for innovative approaches. This research
aims to address the critical gaps in trajectory prediction, heading angle dis-
tribution reconstruction, and air traffic complexity management. By tack-
ling these issues, the primary objective of this research is to introduce novel
methodologies aimed at enhancing the understanding and management of
air traffic complexity in ATM systems. Specifically, the research seeks to
address the following objectives:

• Propose a novel approach to enhance the accuracy and robustness of


flight trajectory prediction.
• Investigate methodologies for reconstructing heading angle distribu-
tions accurately to improve ATM.

6
• Propose a new metric to build and evaluate air traffic complexity map,
thereby improving safety and efficiency in ATM.

• Develop a method to estimate instantaneous aircraft density at the pre-


strategic stage and adapt it for broader applications, including species
distribution modeling.

By achieving these objectives, this research aims to contribute to the


advancement of knowledge and practices in ATM, thereby promoting a safer,
more efficient, and more resilient airspace environments. Through empirical
analysis and theoretical inquiry, this study attempts to provide insights into
the complexity of air traffic and inform the development of strategies to
address operational challenges in ATM systems.

1.3 Air traffic complexity


Air traffic complexity refers to the intricacy and difficulty involved in man-
aging air traffic within a given airspace. This complexity arises from various
factors including the number of aircraft, their performance characteristics,
the structure of airspace, weather conditions, human factors, and technolog-
ical advancements. Complexity in air traffic can be spatial, temporal, and
influenced by dynamic interactions between different elements.
Understanding air traffic complexity is crucial for ensuring the safety
and efficiency of ATM. High complexity can lead to increased workload for
air traffic controllers, higher chances of conflicts, and potential delays. By
understanding and managing complexity, it is possible to optimize air traffic
flow, enhance safety, and improve overall operational efficiency.
Air traffic complexity is an objective and critical measure for deter-
mining the operational state of a given airspace. It is a concept used to
assess the difficulty of handling the air traffic situation. Air traffic com-
plexity is a multi-faceted subject that has received a great deal of attention
in the aviation industry. Many researchers have identified several factors
that contribute to airspace’s inherent complexity, such as the diversity of
aircraft types and capabilities, the various types of airspace classifications,
the presence of numerous air traffic control systems, unpredictable weather
patterns, and varying levels of air traffic volume. In addition, the frequency

7
of aircraft travelling through a specific sector over a given time period is
referred to as air traffic density or congestion. Higher air traffic congestion
can make possible air traffic accidents, so that more stringent monitoring
becomes necessary. The increase in complexity might have a negative affect
on the controller’s decision-making ability, resulting in more errors. Thus,
so-called hot-spots of air traffic congestion necessitate controllers to deter-
mine whether conflict avoidance or modification is required for any planned
trajectories. The measuring of air traffic complexity is a significant issue in
this regard.
In recent years, there has been a growing interest in developing meth-
ods to assess the complexity of air traffic. This has led to a variety of ap-
proaches to evaluate air traffic complexity which can be broadly categorized
into three major groups, including geometry-based approaches, agent-based
models, and machine learning techniques listed in Figure 1.2.

Figure 1.2: Some proposed approaches to evaluate the air traffic complexity.

Geometry-based approaches focus on quantifying the geometric features


of airspace, such as the number of sectors, conflicts, and aircraft trajecto-
ries. Typically, geometry-based approaches utilize the geometric properties
of airspace sectors, such as their size, shape, and orientation. Some common
geometry-based approaches include sector complexity index, conflict detec-
tion rate, and trajectory density. The specific number of metrics used in
geometry-based approaches such as proximity [37], convergence [37], Grass-
mannian metric [38], König [63], fractal dimension [83], Robust extension

8
[119]. While these metrics have proven effective in simple scenarios, they are
unsuitable for complex or large-scale applications. The approaches also fail
to account for spatial-temporal data [124]. The current work attempted to
overcome these challenges by implementing an air traffic complexity metric
based on linear dynamical systems. Numerous studies have examined traffic
structures and airspace geometries to demonstrate the efficacy of this ap-
proach to estimate metric for local disorder and interact the trajectory sets
in ATM. Its applicability derives from its efficiency, relative simplicity, and
mathematically predictable behavior. For example, topological entropy can
evaluate the air traffic complexity by capturing the connectivity and struc-
ture of the airspace [129, 39, 32]. Or Shannon entropy and Kullback-Leibler
divergence were proposed to capture the complexity of air traffic in both
spatial and temporal dimensions, and can be used to evaluate complexity at
different levels of air traffic control [132, 65].
Agent-based models use a simulation-based approach to model the be-
havior of individual aircraft and the interaction between them. This approach
considers various factors, such as weather, airspace structure, and traffic de-
mand, to assess the complexity of air traffic. Agent-based models have been
used to analyze ATFM and air traffic control procedures. Dynamic den-
sity [11, 67] only listed a few factors for dynamic density, and each one is
assigned a subjective weight. Aspects such as the quantity of aircraft, the
number of aircraft with heading changes greater than 15 degrees or speed
changes greater than 10 knots, the sector size, and others are taken into ac-
count. Some papers are related to the application of agent-based simulation
in various aspects of air traffic management such as simulating aircraft taxi-
ing operations [74], assessing unmanned aircraft systems integration in the
national airspace system [27, 127], evaluating and simulating air traffic flow
management strategies by [126, 128, 44, 46].
Machine learning techniques have become increasingly popular in the
field of air traffic complexity evaluation due to their ability to process large
amounts of data and extract patterns and relationships that may not be easily
identifiable through traditional analytical methods. These approaches have
been used in air traffic management to predict traffic flows, identify conflicts,
and optimize air traffic control procedures. For example, some researches de-
veloped machine learning techniques to evaluate air traffic complexity using
flight delay, cancellation, rerouting, and rescheduling [137], or using traffic
volume, density, and complexity level [69]. Other researches used machine

9
learning approach to predict and manage air traffic complexity by consid-
ering air traffic volume, flow, weather conditions, and airport capacity [21,
4, 28, 121, 137]. In 2023, Alharbi et al. (2023) [5] used CNN and LSTM
to predict the air traffic flow as well as compute the air traffic complexity.
The metrics used in the studies varied, including flight delay, cancellation,
rerouting, rescheduling, traffic volume, density, complexity level, air traffic
flow, weather conditions, and airport capacity. The papers highlighted the
potential of machine learning in accurately predicting and managing air traf-
fic complexity. However, limited data availability and potential biases in the
data were identified as potential limitations in some studies.
Hybrid approaches for air traffic complexity evaluation typically com-
bine two or more different techniques, such as geometric metrics, agent-based
models, or machine learning, in order to obtain a more comprehensive and
accurate understanding of the complexity of the airspace system. These ap-
proaches have become increasingly popular in recent years, as they allow for
a more nuanced evaluation of air traffic complexity that takes into account
both the geometric and dynamic aspects of the system. One notable ex-
ample is the hybrid approach proposed by Marwala and Nelwamondo, which
combines machine learning and agent-based methods for air traffic flow man-
agement. This method considers metrics such as delay, capacity, and safety,
providing a holistic view of air traffic complexity. However, it requires a
significant amount of data to train the machine learning models effectively,
highlighting the importance of robust data collection and management in air
traffic complexity assessment [81]. Another significant contribution is the
work of Arribas et al., they integrated geometric metrics (such as distance,
density, angle, and altitude difference) with machine learning techniques.
Their approach was validated using real air traffic data, demonstrating its
effectiveness in capturing the multifaceted nature of air traffic complexity.
By leveraging both geometric and data-driven insights, this hybrid method
enhances the accuracy and reliability of complexity evaluations, providing
valuable information for air traffic controllers and management systems [9].
Incorporating hybrid approaches into air traffic complexity assessment
offers several advantages. They enable the integration of multiple data
sources and methodologies, resulting in a more detailed and robust analysis.
This, in turn, supports the development of more adaptive and responsive air
traffic management strategies, crucial for handling the increasing demands
of modern airspace operations.

10
1.3.1 Factors contributing to air traffic complexity

Air traffic complexity is influenced by various factors that affect the manage-
ment and coordination of aircraft movements. Understanding these factors
is essential for improving the efficiency and safety of air traffic operations.
The design and organization of air routes are critical in determining air
traffic complexity. Air routes are predetermined paths that aircraft follow,
and their configuration can significantly influence traffic flow, congestion,
and the potential for conflicts. Efficiently designed air routes help manage
complexity by reducing bottlenecks and distributing traffic more evenly. Ad-
ditionally, airspace is divided into control sectors, each managed by a specific
air traffic control unit. The size, shape, and traffic volume of these sectors
greatly impact complexity. Smaller sectors with high traffic volumes or com-
plex geometries can increase the difficulty for controllers to manage aircraft
movements safely and efficiently.
Traffic density refers to the number of aircraft within a specific volume
of airspace at a given time. High-density areas, such as those near major
airports or busy air corridors, significantly increase complexity. Managing
high-density traffic requires precise coordination and efficient use of available
airspace to minimize conflicts and delays. Controllers must be adept at
handling a large number of aircraft simultaneously, ensuring safe separation
and smooth flow.
Different aircraft types have varying performance characteristics, includ-
ing speed, maneuverability, and size. These differences contribute to air
traffic complexity as controllers must consider the performance limitations
and capabilities of each aircraft when planning and coordinating movements.
Mixed traffic, involving both fast and slow aircraft, adds to the challenge,
requiring careful sequencing and spacing to maintain safety and efficiency.
Weather conditions, such as storms, turbulence, and low visibility, have
a substantial impact on air traffic complexity. Adverse weather can force
changes in flight paths, increase separation requirements, and lead to delays.
Seasonal variations also play a role, with certain times of the year posing more
significant weather-related challenges than others. Controllers and pilots
must adapt quickly to changing conditions to maintain safe and efficient
operations.

11
Human factors, including the workload, decision-making processes, and
communication between air traffic controllers and pilots, are critical ele-
ments of air traffic complexity. High complexity can lead to increased stress
and fatigue among controllers, potentially impacting their performance and
decision-making abilities. Effective training, communication protocols, and
support systems are essential for managing these human factors. Ensuring
that controllers are well-prepared and supported can mitigate the impact of
high complexity on human performance.
Technological advancements, such as Automatic Dependent Surveillance-
Broadcast (ADS-B), Next Generation Air Transportation System (NextGen),
and the Single European Sky ATM Research (SESAR) initiative, profoundly
impact managing air traffic complexity. These technologies enhance situa-
tional awareness, improve communication, and provide more accurate and
timely information, helping to reduce complexity and improve efficiency. By
leveraging these advancements, air traffic management can become more
proactive and adaptive, ensuring smoother and safer operations.
Given these factors, accurately determining the complexity of airspace
is crucial for effective air traffic management. Understanding complexity
helps in resource allocation, workload distribution, and strategic planning,
ensuring that both safety and efficiency are maintained. Among the factors
discussed, traffic density is often the most influential, as it directly impacts
the number of aircraft that must be managed simultaneously, thus driving
the need for precise coordination and effective use of airspace. Identifying
and addressing areas of high complexity allows for targeted improvements
and innovations in air traffic control strategies.

1.3.2 Air traffic complexity metrics

In a control sector, the control workload increases non-linearly with the num-
ber of aircraft. There exists a threshold beyond which controllers can no
longer accept additional aircraft, necessitating rerouting through less con-
gested neighboring sectors. This state, known as sector saturation, must be
avoided as it leads to cumulative overloading in preceding sectors, potentially
backing up to departure airports. Estimating the saturation threshold is com-
plex and depends on factors such as route geometry, sector layout, aircraft
distribution, and controller performance. A widely accepted threshold is 3

12
conflicts and 15 aircraft per sector, with a maximum duration of 10 minutes
to avoid excessive controller stress, ensuring optimal safety conditions.
Control workload measurement is critical in many areas of ATM, as
it underpins optimization processes. Applications include airspace com-
parison (United States-US/Europe), validation of future concepts SESAR,
NEXTGEN, analysis of traffic control action modes, sectorization optimiza-
tion, and traffic assignment optimization. Additionally, it is essential for
determining congestion pricing zones, developing organic control assistance
tools, generating 4D trajectories, and predicting congested zones.
Currently, the operational capacity of a control sector is measured by
the maximum number of aircraft traversing the sector within a given time
frame. This metric does not account for traffic orientation, treating struc-
tured and disordered traffic equally. Thus, controllers may accept traffic
exceeding operational capacity in structured scenarios while rejecting traffic
in disordered scenarios even if capacity is not reached. Therefore, measuring
the number of aircraft per unit time is an insufficient metric for representing
the difficulty of a traffic situation.
Ideally, a metric should precisely measure the mental effort required to
manage a set of aircraft. While challenging, complexity metrics extending
beyond simple aircraft counts are possible. Two essential notions are:

• Control Workload: Measures the difficulty for the traffic control


system, whether human operators or automated processes, to manage
a situation. This workload is linked to cognitive processes such as
conflict prediction and resolution, and trajectory monitoring.
• Traffic Complexity: Intrinsically measures the complexity of a traf-
fic situation, independent of the managing system, based solely on
trajectory geometry. This complexity is linked to sensitivity to ini-
tial conditions and conflict interdependencies. Uncertainty in positions
and speeds complicates future trajectory predictions, sometimes expo-
nentially increasing, making reliable extrapolation difficult. Conflict
resolution can generate new conflicts, influenced by trajectory mixing
levels. As illustrated in Figure 1.3, traffic situations vary in difficulty
based on predictability and trajectory interdependency levels. For ex-
ample, in a context where radar imaging equipment fails, the situation
on the right would attract immediate attention due to its difficulty

13
in conflict prediction and high trajectory interdependency. The mid-
dle situation, though presenting significant conflict risk, is manage-
able by giving a uniform direction to all aircraft to place them in safe
roundabout trajectories. The left situation, with its non-challenging
trajectories and stable relative distances, presents minimal immediate
difficulties.

Figure 1.3: An example of three traffic scenarios classified in order of in-


creasing complexity [38].
Research into air traffic complexity metrics has gained considerable
attention in recent years, particularly in the US and Europe. The first
projects began in Germany in the 1970s, and the subject has since devel-
oped. Currently, National Aeronautics and Space Administration (NASA),
Massachusetts Institute of Technology, and Georgia Tech are working on
the NextGen project in the US, while in Europe, Direction des Services de
la Navigation (DSNA), Deutsches Zentrum für Luft- und Raumfahrt (Ger-
man Aerospace Center), and Nationaal Lucht- en Ruimtevaartlaboratorium
(Netherlands Aerospace Centre) are involved in similar activities related to
SESAR.
Thus, metrics are essential for measuring and managing air traffic com-
plexity. They provide quantitative data that can be analyzed to understand
the level of complexity in a given airspace and to develop strategies for man-
aging it. Different metrics focus on various aspects of complexity, offering a
comprehensive view when used together. In this chapter, we will present the
following three approaches to measure air traffic complexity:

14
• Flow-based metrics.

• Geometry-based metrics.

• Dynamic-based metrics.

Flow-based metrics

The flow-based metrics measures complexity based on traffic flow and den-
sity. It focuses on the number of aircraft passing through a specific area over
a period and the interactions between these aircraft. Traffic density metrics
measure the number of aircraft within a defined airspace volume, while the
traffic flow rate assesses the rate at which aircraft enter and exit a specific
sector or route. Flow-based metrics are particularly useful in high-density
airspaces, such as those around major airports. Monitoring traffic density
can help identify potential bottlenecks and enable proactive management to
prevent congestion. Flow-based metrics are straightforward and easy to in-
terpret, making them useful for quick assessments. However, they may not
capture the full complexity of interactions between aircraft or account for
dynamic changes in traffic patterns.

Geometry-based metrics

The geometry-based metrics (based on the geometric distribution of speed


vectors in the airspace) focuses on the spatial aspects of air traffic complex-
ity. It considers the positions, trajectories, and separation distances between
aircraft. Proximity measures evaluate horizontal and vertical separation be-
tween aircraft, while conflict probability assesses the likelihood of potential
conflicts based on current trajectories. Geometrical metrics are particularly
useful in sectors with complex airspace structures. For example, evaluating
proximity measures can help ensure safe separation distances are maintained,
reducing the risk of conflicts. Geometrical metrics provide detailed insights
into spatial relationships between aircraft, offering a clear picture of poten-
tial conflicts. However, they can be complex to calculate and may require
advanced tools and software for accurate assessment.

15
Dynamic-based metrics

The dynamic-based metrics using a dynamic system (linear or non-linear)


to model air traffic focuses on the temporal aspects and changes in traffic
patterns over time. It considers how traffic evolves and the stability of traffic
flows. Temporal stability assesses how stable traffic patterns are over time,
while the rate of change in traffic patterns measures the speed at which traffic
patterns change, indicating potential volatility. Dynamical metrics are useful
in managing air traffic during peak periods or unexpected disruptions. For
example, assessing temporal stability can help predict and manage traffic
surges, ensuring smoother operations. Dynamical metrics provide insights
into the temporal dynamics of air traffic, helping to anticipate and manage
fluctuations. However, they can be challenging to interpret without compre-
hensive data and may require sophisticated analytical tools.
Accurately measuring air traffic complexity is crucial for effective air
traffic management. Metrics enable air traffic controllers and planners to
allocate resources efficiently, distribute workloads effectively, and implement
strategic planning to maintain safety and efficiency. Among the various met-
rics, those related to traffic density often have the most significant impact,
as they directly influence the number of aircraft that need to be managed si-
multaneously. Understanding and utilizing these metrics allows for targeted
improvements in air traffic control strategies, helping to address areas of high
complexity and enhance overall ATM.

1.4 Contributions

Here, we contribute to resolve the following challenges: First and foremost,


in the context of flight trajectory prediction, ensuring high accuracy is piv-
otal for effective air traffic management. Existing methods often struggle
to capture the intricate spatial and temporal dependencies inherent in flight
data, leading to sub-optimal prediction performance.
Additionally, reconstructing the heading angle distribution plays a cru-
cial role in understanding the directional dynamics of airspace traffic. How-
ever, traditional approaches fail to provide a comprehensive framework for re-
constructing these distributions accurately from observed or simulated data.

16
Moreover, assessing spatial traffic complexity is essential for optimizing
ATM strategies. Current methodologies lack a systematic approach to quan-
tify the intricate spatial relationships within airspace, hindering efforts to
improve overall airspace efficiency and safety.
The four key contributions of the thesis are:

1. A method named BayesCoordLSTM for enhancing flight trajectory


prediction performance. In BayesCoordLSTM method, a coordinate
transformation step was proposed to convert the flight trajectory co-
ordinates into cylindrical coordinates. Then, we proposed to use a
hybrid Convolutional Neural Network (CNN) and Long Short-Term
Memory (LSTM) architectures to capture spatial features and tempo-
ral dependencies of flight data. Finally, Bayesian theorem that provides
a powerful framework for incorporating prior knowledge and updating
predictions based on new information was employed. Bayesian theorem
allows for a more nuanced and probabilistic approach to trajectory pre-
diction. Using Bayesian theorem can better estimate the uncertainties
inherent in trajectory predictions and develop more effective strategies
for managing air traffic in increasingly congested airspace. Experimen-
tal results on a real dataset confirm that the proposed model enhances
predictive capabilities, providing probabilistic trajectory forecasts with
increased reliability.
2. A method for constructing spatial traffic complexity maps grounded in
the Maximum Entropy principle: This contribution addresses the chal-
lenge of reconstructing heading angle distributions from observed or
simulated data. By employing the principle of Maximum Entropy, this
methodology incorporates statistical estimation of Fourier coefficients
to assess density and employs Mahalanobis distance to optimize den-
sity via partially finite convex programming. Our approach enables the
precise reconstruction of heading angle distributions, offering valuable
insights into the directional behavior and dynamics of air traffic flow.
The resulting probability distributions are then used to produce spatial
complexity maps, which provide a visual and analytical representation
of air traffic complexity. These maps can be instrumental in identi-
fying areas of potential congestion and in guiding traffic management
strategies to ensure the safe and efficient movement of air traffic. The
detailed information are to be exploited to produce complexity maps,

17
as explained in the next point. The resulting probability distributions
are to be exploited to produce complexity maps, as explained in the
next point.

3. A method for building a spatial complexity map: This contribution


presents a systematic framework for assessing spatial traffic complexity
within airspace. This methodology to assess local air traffic complex-
ity using the Maximum Entropy principle and information geometry
on generalized von Mises distributions. Utilizing information geom-
etry, it computes geodesic distances to the uniform distribution and
creates complexity maps by using Composite Complexity Index that
evaluate an index combining angular distribution and traffic intensity.
These maps offer a nuanced understanding of air traffic complexity, en-
abling more informed ATM strategies. Our approach demonstrates the
numerical efficiency and effectiveness of the strategy with real data.

4. A method for reconstructing spatial density based once again on the


Maximum Entropy principle. This is a fairly general method, which
could be used in ATM for the reconstruction of instantaneous aircraft
density in airspace, at the pre-strategic stage. This method largely uti-
lizes the tools from the chapter on the reconstruction of angular den-
sities, but their use differs significantly. The developed approach not
only addresses the unique requirements of spatial density reconstruc-
tion in aviation but is also versatile and adaptable, making it suitable
for a broad range of other applications in statistics. For instance, it
could be effectively employed in environmental studies for the mod-
eling and prediction of species distributions across different habitats,
thereby highlighting its versatility beyond aviation.

1.5 Structure of the thesis


The thesis is organized as follows: Chapter 2 introduces an new method to
predict flight trajectories, named BayesCoordLSTM. It is a hybrid model that
transforms coordinate and applies Bayesian theorem into the ConvLSTM
models. The proposed model leverages the spatial features gleaned by the
CNN architecture and the temporal dependencies captured by LSTM to en-
hance the accuracy of trajectory predictions. By incorporating Bayesian the-

18
orem, our model provides a probabilistic trajectory forecasts and associated
confidence levels while coordinate transformation enhances spatial awareness
and predictive capabilities. We test on the flight data collected from the
Hartsfield–Jackson Atlanta International Airport (ATL). Our experimental
results demonstrates the effectiveness of the proposed BayesCoordLSTM-
based approach in improving flight trajectory prediction accuracy, with a fo-
cus on Mean Absolute Error (MAE) and Root Mean Square Error (RMSE)
values. The integration of the Bayesian theorem and Coordinate transfor-
mation into ConvLSTM models represents a substantial advancement in the
field of flight trajectory prediction.
An important issue in air traffic management is the construction of traf-
fic complexity maps. Observed or simulated data provide a statistical sample
of trajectory angles. The underlying angular density is a crucial component
in assessing local complexity. In Chapter 3, we propose a Maximum Entropy
method for reconstructing the angular densities. These densities are con-
strained by the estimation of there Fourier coefficients. Accounting for the
statistical properties of moment estimation is achieved by using the square of
the Mahalanobis distance between statistical data and Fourier coefficients of
the sought density as the data fitting term. Partially finite convex program-
ming is then implemented to compute the optimal density. The cornerstone
of this approach, enabling a rigorous treatment of variational aspects of the
chosen method, is an infinite-dimensional version of Fenchel’s duality theo-
rem, coupled with results on the conjugacy of integral functionals. We also
discuss the numerical aspects of our approach and present some simulations
and reconstructions from simulated data.
In order to define and compute the air traffic complexity map, we use
information geometry. Chapter 4 focuses on the methodology for building
air traffic complexity maps by using the geodesic distance between the Max-
imum Entropy densities obtained in the previous chapter and the uniform
distribution, which corresponds for a given density of traffic to the maximal
complexity. We also test on real flight data, demonstrates robustness, with
alternative criteria possible by comparing heading distributions at grid cell
boundaries. Our strategy for building complexity maps is both numerically
efficient and effective in capturing air traffic complexity.
In Chapter 5, we develop a Maximum Entropy approach to the eval-
uation of spatial density. This methodology extends upon previous contri-

19
butions in the field, particularly those related to the modeling of species
distribution. However, it introduces significant refinements by avoiding the
discretization of the sample space, thus allowing for a more continuous and
precise representation of spatial density. This continuous approach provides
a more accurate framework for analyzing complex spatial patterns, which is
particularly valuable in various scientific and engineering applications. While
this chapter focuses primarily on the theoretical development and demonstra-
tion of the methodology, its potential use in ATM, particularly for optimizing
airspace utilization and enhancing safety, will be explored and expanded upon
in future work.
Finally, Chapter 6 provides a summary of the thesis, gives conclusions
and potential research directions based on the presented results and method-
ologies that were explored.

20
Chapter 2

Applying the Bayesian method


to improve flight trajectory
prediction
Flight trajectory prediction is essential for aviation, playing a key role in effi-
cient air traffic management and ensuring safe flight operations. Traditional
methods often struggle to manage the complex spatio-temporal dependencies
and uncertainties inherent in aviation data. In this study, we propose a novel
approach, BayesCoordLSTM, for trajectory prediction. This hybrid model
combines Convolutional LSTM (ConvLSTM) with the Bayesian method and
coordinate transformation. ConvLSTM leverages CNN architecture to ex-
tract spatial features, while LSTM captures temporal dependencies. By in-
corporating the Bayesian method, our model offers probabilistic trajectory
forecasts with associated confidence levels. Additionally, coordinate trans-
formation enhances spatial awareness and predictive accuracy. Experimental
results show that the BayesCoordLSTM model significantly improves predic-
tion accuracy, as indicated by MAE and RMSE metrics. The integration of
the Bayesian method and coordinate transformation into ConvLSTM models
represents a significant advancement in flight trajectory prediction.

2.1 Introduction

Deep neural networks (DNNs) are extensively applied in various domains,


including safety-critical areas such as self-driving automobiles and security
systems that use face and image recognition. However, DNNs are consid-
ered as deterministic models because all unknown parameters, such as net-

21
work weights, biases, and model predictions, are single point-estimates. This
determinism necessitates a large dataset for training, which makes convolu-
tional neural networks (CNNs) vulnerable to overfitting on small datasets.
Consequently, DNNs may perform well on training data but fail to general-
ize to new data, leading to overconfidence or uncertainty in their predictions.
This issue can be addressed using the Bayesian method.
The Bayesian method has become crucial for enhancing DNNs. In a Ba-
yesian model, the posterior distribution provides comprehensive information
about unknown parameters given the data. Various techniques, including
Markov Chain Monte Carlo (MCMC), Laplace approximation, expectation
propagation, and variational inference, have been employed to approximate
the posterior distribution. These methods make Bayesian models robust
to overfitting, capable of estimating uncertainty, and effective at learning
from smaller datasets. Variational inference, in particular, minimizes the
Kullback-Leibler divergence between a proposed approximate posterior and
the true unknown posterior distribution of the weights, addressing the in-
tractability of the exact Bayesian method.
In air traffic management (ATM), the increasing airspace demand puts
pressure on current systems to ensure aircraft maintain safe separation stan-
dards to avoid collisions [40]. This necessitates accurate predictions of air-
plane orbits in the near future [98]. However, this is challenging due to the un-
certainty inherent in predicting aircraft trajectories. Therefore, understand-
ing and managing this uncertainty is crucial in ATM, where meteorological
factors, especially winds, represent significant sources of uncertainty. Uncer-
tainty in weather forecasts arises from inadequate or inaccurate knowledge
of the atmosphere’s state at the time of prediction and the uncertainties of
the model [40]. These uncertainties propagate through nonlinear and chaotic
atmospheric dynamics, making simple statistical characterization inadequate
for depicting prediction uncertainty. Consequently, these factors can affect
aircraft velocity and the accuracy of trajectory predictions. Most methods
for flight conflict resolution do not account for this uncertainty. As a result,
they produce solutions like collision-free flight routes that are vulnerable to
disturbances and increase the risk of violating separation standards.
As air transport data volumes increase, novel analysis techniques and
operational strategies are crucial in ATM for more precise trajectory predic-
tion models. However, uncertainties, including weather and unknown vari-

22
ables from un-modeled actors, remain significant challenges in predicting
flight trajectories [88]. Researchers have explored many approaches, from
physics-based methods relying on flight dynamics to data-driven methods
using machine learning algorithms to analyze historical data [70]. As de-
mands on ATM grow and the need for more accurate predictions intensifies,
there is a rising interest in data-driven approaches to overcome the limita-
tions of traditional methods. One significant challenge in predicting aircraft
trajectories is the uncertainty arising from factors such as changing weather
conditions. Many researchers utilize the Bayesian method to quantify uncer-
tainty in predicting time-series data, including flight trajectory prediction in
ATM.
Despite significant advancements in trajectory prediction, data-driven
approaches remain susceptible to overfitting and often struggle to generalize
well to new and unseen flight patterns. Furthermore, information about
an aircraft position, altitude, and heading is typically processed relative to
a reference point, making coordinate transformation essential for accurately
understanding and predicting aircraft trajectories. In this chapter, we aim to
improve flight trajectory prediction accuracy by integrating Bayesian theory
and coordinate transformation in a proposed model called BayesCoordLSTM.
The primary contributions of this study can be summarized as follows:

• Coordinate transformation. By adopting coordinate transforma-


tion, the model gains a significant advantage in spatial awareness and
predictive capabilities. This transformation enhances the model’s un-
derstanding of the spatial aspects of aviation trajectories, allowing it
to effectively relate positions and headings to a reference point. As a
result, the model is able to make more accurate predictions.

• Bayesian integration. We apply the Bayesian method to update


the weights of hyperparameters in a hybrid prediction method that
combines CNN and LSTM. The 1D convolutional layer of the CNN
extracts spatial features, while the LSTM captures the temporal fea-
tures of the flight dataset. By incorporating Bayesian principles, our
model effectively handles uncertainty in predictions, offering probabilis-
tic trajectory forecasts along with confidence levels. This comprehen-
sive approach not only enhances predictive accuracy but also optimizes
hyperparameters.

23
• Empirical validation. We test our model on a real dataset and
compare its prediction performance with three models, including 3D-
CNN [95], CNN-Gated Recurrent Unit (GRU) [106], and a combination
of CNN-GRU with 3D Convolution (CG3D) [107]. The analysis high-
lights the superior accuracy and robustness of our proposed model in
flight trajectory prediction, demonstrating its practical utility in ATM.

The remainder of this chapter follows this structure: Section 2.2 presents
a brief literature review on flight trajectory prediction. Some basic informa-
tion of LSTM are presented in Section 2.3. Section 2.4 illustrates our model
and the experimental results of a real flight dataset are presented in Sec-
tion 2.5. Finally, Section 2.6 summarizes the conclusions and outlines future
works in this field.

2.2 Literature review


In recent years, the demand for air travel has significantly increased, leading
to congestion in the control sectors. With the rapid increase in global air
traffic, particularly in high-density airspace regions, ensuring the safety and
efficiency of air transportation has become a critical challenge. Air traffic
control is a service provided by ground-based controllers who direct aircraft
on the ground and through controlled airspace, and can provide advisory
services to aircraft in non-controlled airspace.
The primary purpose of air traffic control is to prevent collisions, orga-
nize and expedite the flow of air traffic, and provide information and other
support for pilots. This involves enforcing traffic separation rules to ensure
that each aircraft maintains a minimum amount of empty space around it
at all times. Despite advancements, tactical air traffic control decisions to
ensure safe separation between aircraft are still made by human controllers,
similar to practices from 50 years ago [113, 15, 33].
According to the Federal Aviation Administration (FAA) and Euro Con-
trol Action Plan 16 (AP16), a trajectory is defined in four dimensions: lati-
tude, longitude, altitude, and time. Due to factors such as airspace conges-
tion, weather conditions, temporary military activities, and airspace restric-
tions, the actual and planned flight paths often differ (see Figure 2.1).

24
Figure 2.1: Different stages of a flight [135].

As part of ATM, ATFM provides services that complement air traffic


control. ATFM aims to ensure an optimal flow of air traffic through ar-
eas where demand exceeds capacity, thus protecting air traffic control from
overload situations that could compromise safety [3, 72]. Accurate flight tra-
jectory prediction is vital for preventing traffic jams and collisions, optimizing
airspace utilization, and mitigating congestion challenges [136].
Flight trajectory prediction involves estimating a flight’s future posi-
tion, altitude, heading, and speed based on historical data and current state.
There are two main types of trajectory prediction: strategic and tactical pre-
dictions. Strategic prediction forecasts possible future trajectories based on
flight plans, weather forecasts, and historical patterns, while tactical predic-
tion incorporates dynamic information like current flight status and airspace
congestion. Additionally, trajectory prediction can be categorized into short-
term and medium to long-term based on the time scale [125].

• Short-term prediction. This involves predicting the flight trajec-


tory over a period of a few minutes or less. Due to the brief forecast
period, detailed information about weather conditions and long-term
flight intentions is not required. However, assumptions such as fixed
aircraft control and constant rotational speed are necessary. These
assumptions are valid for a shorter propagation period, enhancing pre-

25
diction accuracy while reducing the size of the prediction interval. This
type of prediction facilitates immediate conflict risk detection, allowing
for effective conflict resolution. Additionally, accurate trajectory pre-
dictions can support the generation of viable alternative trajectories,
accommodating existing constraints.

• Medium to long-term prediction. This involves predicting the


flight trajectory over a period of ten minutes or more. Due to the ex-
tended prediction horizon, it requires essential information such as long-
term flight intentions, environmental data, flight performance data, and
navigation data. The inherent uncertainties in these inputs can reduce
prediction accuracy and extend the prediction period. The primary
purpose of this type of prediction is to assess airspace flow and develop
flight plans by the aircraft operations center or flight control center.

Trajectory prediction methods can generally be categorized into two


primary approaches: physics-based methods and data-driven approaches.
Physics-based methods rely on classical mechanics and aerodynamics
principles to model flight motion. These methods use mathematical equa-
tions to describe the dynamics of a flight, taking into account forces like lift,
drag, thrust, and gravity. In this approach, the trajectory prediction is based
on the flight dynamics model, motion characteristics, and the Markov prop-
erty of its state [136]. This approach is highly solvable and performs well in
short-term predictions, especially for large flights with relatively stable mo-
tion parameters. The next moment’s state of the flight is closely linked to its
current state, following the principles of Markov theory [123]. For instance,
researchers frequently apply the Kalman Filter to estimate future trajecto-
ries based on noisy measurements [138]. These approaches provide valuable
insights into flight dynamics and are often employed for trajectory estimation
under controlled and predictable conditions. However, as the prediction time
span increases, the error in these state estimation models tends to escalate
rapidly [136].
In the context of air traffic control, short-term trajectory predictions
(typically up to 6 seconds on radar screens) rely on extrapolating speed
trends using simple state estimation models. Ongoing efforts aim to improve
accuracy in long-term trajectory predictions by addressing challenges related
to uncertainties and nonlinearities in flight dynamics over extended periods.

26
Therefore, in recent years, data-driven approaches have gained popu-
larity, leveraging historical flight data and other relevant variables to learn
patterns and relationships. These patterns are used to create a data-driven
model, which is trained to represent the collective behavior of the trajec-
tories based on input features (e.g., past positions, velocities) and future
trajectory predictions using machine learning models. According to [35], al-
though machine learning is one of the most researched topics in computer
science, it has not yet been widely adopted by end-users in ATM domain.
This research demonstrates that the number of papers on machine learning
in ATM has significantly increased in the recent years, especially in predict-
ing and optimizing hotspots to avoid collisions and managing traffic flows
- one of the most consistent subjects of machine learning in ATM. Using
the machine learning model, it can extract the common traits by learning
from extensive historical trajectory data. These techniques can also incor-
porate additional inputs such as aircraft performance, air route structures,
and environmental factors (data-to-data). Machine learning models consis-
tently provide accurate predictions and benefit from a wide range of trainable
parameters. Supervised learning, in particular, is a well-known approach
used by many researchers for trajectory prediction. By learning from large
datasets of historical trajectory data, these models can generalize and predict
future trajectories effectively. Furthermore, these methods can utilize flight
performance metrics, flight path structures, and environmental parameters
as additional inputs for enhanced trajectory prediction.
Machine learning algorithms consistently provide accurate predictions
and profit from a wide range of trainable parameters [34]. These approaches
often use filter-based methods [94, 130] or deep learning-based methods [107]
to extract the properties of data changes through data analysis to forecast
the trajectory. Recently, significant advancements have been made in flight
trajectory prediction through the use of advanced machine learning tech-
niques, particularly CNN and Recurrent Neural Network (RNN). Among the
various types of RNNs, LSTM networks have proven particularly effective
for handling sequences and time series data. In 2021, Shi et al. proposed a
constrained LSTM model specifically for predicting flight trajectories [111].
In their research, they defined three types of constraints: climbing, cruis-
ing, and descending/approaching phases. Their constrained LSTM model
incorporates these phase-specific constraints to account for the dynamic na-
ture of flight, ensuring both long-term dependencies and realistic physical

27
constraints. Compared to a standard LSTM, this model significantly en-
hances prediction accuracy by maintaining trajectory continuity and effec-
tively managing sparse way points in flight data. Other methods applied the
Bayesian method in machine learning models: Graph Convolutional Network
to forecast the traffic flow in the next 15, 30, 45, and 60 minutes (SZtaxi
and Los-loop) [45], a deep Gaussian process [29], LSTM and the LSTM’s
variants, including gated LSTM [31], convolutional LSTM [110, 87, 108],
CNN-LSTM [73, 76], deep LSTM [87, 131, 30, 1], and sequence-to-sequence
LSTM [114]. CNNs, RNNs, and LSTMs were used to extract spatial fea-
tures from local regions and long-range dependencies in images, respectively.
These studies provide valuable references for the model architecture used in
this work.
LSTM networks are designed to handle sequential data and capture
temporal dependencies effectively, making them well-suited for tasks involv-
ing time series data. However, these models are not inherently equipped
to capture spatial features unless they are modified or combined with other
methods. In applications such as flight trajectory prediction, where both
temporal and spatial features are crucial, a standard LSTM may fall short
in adequately addressing spatial aspects. This is why additional constraints
or hybrid models that incorporate spatial information are often necessary.
In this context, CNN is more suitable for extracting spatial features. Hence,
a hybrid CNN and LSTM [76, 71] or combined CNN and GRU [107, 116]
approaches have been widely used in classification and prediction tasks. For
example, Shafienya et al. (2022) [106] compared various CNN-GRU models
for predicting 4D flight trajectories using ADS-B data. CNNs extract spa-
tial features from flight data, while GRUs capture temporal dependencies.
Combining CNNs with GRUs enhances prediction accuracy by effectively
capturing both spatial and temporal patterns. They subsequently extended
their research by combining a CNN-GRU with a 3D-CNN, which effectively
extracts spatial-temporal features for prediction [107]. However, a potential
drawback of their models is their reliance on large datasets for training and
evaluation, as well as the complexity involved in parameter tuning for the
combined CNN-GRU and 3D-CNN architectures.
Additionally, these approaches often learn to predict only specific types
of flight trajectories. To handle different trajectory types or routes, re-
searchers must retrain the model or adjust its parameters. Although ma-
chine learning has achieved significant advancements in trajectory prediction,

28
it still has limitations, such as learning only specific types of trajectories.
Therefore, if the model needs to predict different types of flights or routes,
it must be retrained by adding or adjusting hyperparameters. This process
increases both the cost and the time required for retraining.
Moreover, in flight trajectory prediction, uncertainty can arise from fac-
tors such as weather conditions or unpredictable aircraft behaviors, as well
as from the data itself or the optimal values of model parameters. Failure to
capture these uncertainties can result in models that do not provide prob-
abilistic predictions or associated confidence levels, ultimately diminishing
prediction accuracy. This can have significant implications for safety and
performance in the aviation industry.
To address this problem, the Bayesian framework offers several advan-
tages: it accommodates various prior ranges, avoids costly computations,
integrates well with LSTM models for continuous-time responses, and effec-
tively handles large datasets. Many researchers have applied the Bayesian
method to various models, including Graph Transformers [88] and CNN-GRU
architectures [59]. This approach not only estimates model uncertainties but
also optimizes predictions to enhance performance [10, 2].
In a 3D space, as aircraft move, information about their position, alti-
tude, and heading is processed relative to a reference point. Without coordi-
nate transformation, the model may not effectively capture the real spatial
understanding or accurately depict the complex variations in flight trajecto-
ries [118]. This limitation diminishes the accuracy and reliability of aircraft
position predictions. Therefore, integrating the Bayesian method with coor-
dinate transformation into these models enhances their capability to handle
flight trajectory predictions. This combination improves the model’s ability
to manage uncertainty, benefiting ATM by enhancing safety and efficiency,
and optimizing model parameters for better and more stable performance.
By learning from historical data, data-driven approaches can adapt to
changing conditions and improve the accuracy and reliability of flight path es-
timation. However, challenges such as generalizing from sparse data, dealing
with unpredictable weather conditions, navigating dynamic airspace struc-
tures, and achieving real-time prediction remain. To address these challenges,
integrating physics-based knowledge with data-driven techniques presents a
promising solution. This hybrid approach leverages the strengths of both
methods, offering enhanced prediction accuracy and reliability in ATM [70].

29
2.3 Long Short-Term Memory

2.3.1 LSTM architecture

Long Short-Term Memory (LSTM) is an advanced variant of RNN introduced


by Hochreiter and Schmidhuber in 1997 [56]. The main purpose of LSTM is
to overcome the limitations of traditional RNNs, such as difficulties in learn-
ing and remembering long-term dependencies due to the vanishing gradient
or exploding gradient problems. LSTM allows neural networks to retain in-
formation for longer periods and learn long-term dependencies, which are
important for tasks involving sequential data, such as natural language pro-
cessing, speech recognition, and time series prediction. Figure 2.2 illustrates
the LSTM architecture.

Figure 2.2: LSTM architecture [56].

The architecture of LSTM consists of a series of cells connected sequen-


tially, where each cell C at time step t is responsible for storing and updating
information from the previous time step (t − 1). The key improvement of
LSTM over traditional RNNs lies in the structure of these cells, which include
three primary gates that manage and regulate the flow of information:

• Forget gate f : decides which information from the previous cell state
should be discarded.

• Input gate i: determines which new information should be added to


the cell state.

• Output gate o: controls which part of the cell state should be output
as the current hidden state.

30
These gates consist of sigmoid layers, with output values between 0 and 1.
These values decide how much information to retain or discard, enabling
LSTMs to better control the flow of information over time.

2.3.2 LSTM operation


LSTM works through a series of operations to update the cell state and
compute the hidden state at each time step t. We denote by xt , Ct and ht as
the input, the cell state and the hidden state at time t, respectively.
The operations in LSTM can be divided into the following steps:
First, the LSTM determines which information from the previous cell
state (Ct−1 ) should be forgotten or retained. This decision is made through
the forget gate, which takes the previous hidden state ht−1 and the current
input xt as inputs:
ft = σ(Wf · [ht−1 , xt ] + bf ). (2.1)
The output ft is a value between 0 and 1, representing the fraction of infor-
mation to retain from the previous cell state. If ft = 0, all the information
is discarded, and if ft = 1, all the information is retained.
Next, the LSTM decides which new information will be added to the
current cell state. This is done by the input gate, where two main operations
take place:

• A sigmoid layer determines which values should be updated:

it = σ(Wi · [ht−1 , xt ] + bi ). (2.2)

• Then, a tanh layer creates a vector of candidate values C̃t , representing


potential new information to be added to the cell state:

C̃t = tanh(Wc · [ht−1 , xt ] + bc ). (2.3)

The new cell state Ct is updated by combining the filtered previous state
and the new candidate values. This update is performed as follows:

Ct = ft ∗ Ct−1 + it ∗ C̃t , (2.4)

31
where ft helps in discarding or retaining the old information while it and C̃t
add new information to the cell state.
Finally, the LSTM decides what to output as the hidden state at the
current time step through the output gate. The output gate uses a sigmoid
layer to filter the cell state and determine which parts of it should be used
to compute the hidden state:

ot = σ(Wo · [ht−1 , xt ] + bo ). (2.5)

The final hidden state ht is computed by multiplying ot by the tanh of the


updated cell state Ct :
ht = ot ∗ tanh(Ct ).

2.4 Proposed model


While LSTMs excel at capturing temporal dependencies, they may not ad-
equately capture spatial features. On the other hand, CNNs are well-suited
for extracting spatial information. Consequently, the combination of CNNs
and LSTMs has been widely adopted for various classification and prediction
tasks [109, 58].
Moreover, applying the Bayesian method to LSTM models has been
proposed by many researchers [82, 122, 133, 105]. The Bayesian frame-
work is flexible with various prior ranges, avoids expensive computations,
and integrates with LSTM models to develop continuous-time responses and
handle large datasets. This approach allows hyperparameters to be tailored
to the specific characteristics of the data. Therefore, it not only estimates
the model’s uncertainty but also optimizes predictions [8, 10, 2, 112].
To enhance the performance of flight trajectory prediction, we propose
a hybrid model named BayesCoordLSTM. This model combines coordinate
transformation and the Bayesian method within a CNN-LSTM framework.
As illustrated in Figure 2.3, the BayesCoordLSTM comprises two main com-
ponents: a Convolutional Neural Network (CNN) and a Bayesian LSTM
network.
The first component of the BayesCoordLSTM is a CNN, which extracts
spatial features from the input data, allowing the model to effectively capture

32
and process intricate spatial patterns in flight trajectories. The second com-
ponent is a Bayesian LSTM network, designed to capture long-term temporal
dependencies. By incorporating the Bayesian method, the LSTM network
enhances its ability to manage uncertainties and provide more robust predic-
tions over extended time sequences. Together, these components enable the
BayesCoordLSTM to deliver superior performance in flight trajectory pre-
diction by integrating advanced spatial and temporal data processing with
probabilistic reasoning.

Output

FC
Layer
Latitude
Longtitude Bayesian LSTM t+2M
Altitude t+M+1
Flight LSTM LSTM LSTM LSTM
Normalization Predicted
trajectory Coordinate Trajectories
Transformation
R
θ
h CNN
t+M
t
4 -D trajectories

Figure 2.3: The BayesCoordLSTM framework to predict flight trajectory.

In the context of this research, the inputs and outputs of our model for
flight trajectory prediction are defined as follows:

• Input: X ,Xit , Pit .


Where:
– X = {Xi } denotes the set of flight trajectories. Here i represents
the trajectory number ranging from 1 to N , where N is the total
number of flight trajectories. Each Xi represents the flight path
of trajectory i.
– A trajectory includes a series
n of multidimensional
o trajectory points
ordered by time: Xi = Pi , Pi , . . . , Pi
t t t+1 t+M
is a list of M points
of flight trajectory. Pi denotes the point of the observed flight
t

trajectory i at time t.
– Each trajectory point Pit is represented as
Pit = (longit , latti , altti , vit , θit ),

33
where each point has five components, corresponding to longi-
tude, latitude, altitude, speed, and heading of the flight at time t,
respectively.
 
• Output: Pit+M +1 , Pit+M +2 , . . . , Pit+2M represents the predicted flight
trajectories.

For example, assume that we use 100 points to predict five points in
the future, so that M = 100. For flight i, starting at time t = 1, we have
Pit = (Pi1 , Pi2 , . . . , Pi100 ). The output then becomes (Pi101 , Pi102 , . . . , Pi105 ),
corresponding to five points in the future.

2.4.1 Coordinate transformation


In the trajectory prediction framework, both the input and output trajec-
tories are converted into discretized vectors to ensure numerical stability in
deep learning models. Flight trajectories involve latitudes, longitudes, and
altitudes that span different value ranges. To achieve uniformity, these vari-
ables are normalized based on their specific minimum and maximum values
for a given airport. However, this normalization approach lacks generaliz-
ability across different airports due to varying latitude and longitude ranges.
To address this issue, trajectories are transformed into a cylindrical coor-
dinate system, aligning flight states relative to the airport’s position [24].
Given the flight position xf p = (longf p , latf p , altf p ) and the airport position
xap = (longap , latap , altap ), we can transform coordinates as follows:

∥xfp − xap ∥
R= , (2.6)
Dmax
!
longfp − longap
θ = arcsin , (2.7)
∥xfp − xap ∥
|altfp − altap |
h= , (2.8)
Amax
where:

• Dmax is the distance extending from the airport and encompassed by


the predictive scope of the framework;

34
• Amax corresponds to the highest achievable altitude within the predic-
tive domain;

• R, h: are normalized distances to the airport and altitude difference;

• θ: is the angle (heading) with the north direction.

In this thesis, we choose Dmax = 328084 feet and Amax = 39370 feet to
predict aircraft trajectories, balancing computational efficiency and practi-
cality. These values provide sufficient coverage while conserving computa-
tional resources and align with typical aircraft operating ranges and altitudes,
ensuring relevance to real scenarios.

2.4.2 CNN layer


Using the Conv1D layer plays a crucial role in high-level feature extrac-
tion, enabling the network to effectively understand and process temporal
dependencies within the input data. In this layer, the input consists of 6
channels, representing different features (timestamp, longitude, latitude, al-
titude, speed, and heading). The layer utilizes 1-dimensional filters that slide
across the time dimension to capture relevant patterns within the data.

1. Convolution operation: Each filter performs a convolution operation,


which involves multiplying the filter weights with the input values and
summing the results to produce a single output for each position. This
process highlights important features by learning spatial hierarchies.

2. Activation function: After convolution, the ReLU (Rectified Linear


Unit) activation function is applied. The ReLU function introduces
non-linearity by retaining positive values while setting negative values
to zero, allowing the model to learn complex relationships within the
data.

3. Output: The output from the Conv1D layer consists of 32 feature maps,
each representing a learned pattern from the input data. This enriched
representation retains critical information for subsequent layers in the
model.

35
2.4.3 Bayesian optimization
Bayesian optimization is a powerful technique for optimizing complex, high-
dimensional functions with costly evaluations, such as the hyperparameters
of deep learning models. Incorporating the Bayesian methods into LSTM
models, as proposed by several researchers [82, 99, 139, 88, 59], involves cre-
ating a probabilistic model to capture the uncertainty in the function being
optimized. This approach is advantageous because it enables a more efficient
and informed optimization process, leveraging prior knowledge to minimize
the number of evaluations required. The Bayesian framework’s flexibility
with various prior ranges, its avoidance of expensive computations, and its
compatibility with LSTM models for continuous-time responses make it well-
suited for handling large datasets. By utilizing Bayes’ theorem, the hyperpa-
rameters of the LSTM model can be tuned to specific characteristics of the
data, enhancing model performance and providing insights into uncertainty.
In the context of Bayesian LSTM modeling, where uncertainty is critical
for parameter estimation, we use a general formula for sampling weights W
and biases b at the tth time step of the nth layer. These are given by Equa-
tion (2.9) and Equation (2.10), respectively:
 
(t) (t) (t)
W(n) = N (0, 1) ∗ log 1 + ρ(w) + µ(w) , (2.9)
and  
(t) (t) (t)
b(n) = N (0, 1) ∗ log 1 + ρ(b) + µ(b) , (2.10)
where N (0, 1) represents sampling from a standard Gaussian distribution,
introducing randomness in weight and bias initialization, and ρ and µ are
discussed hereafter. This randomness is vital for effectively exploring the
solution space during training.
The term log(1 + ρ) transforms the estimated standard deviation ρ of
the input feature into a non-negative space, ensuring the non-negativity of
sampled weights and biases, which is essential for numerical stability during
both training and inference.
Moreover, incorporating the mean value µ derived from the training
data into the sampling process enables the model to utilize prior knowledge
about the distribution of weights and biases. This enhances parameter esti-
mation efficiency and improves the model’s performance to capture complex
temporal dependencies.

36
In this phase, Bayesian optimization leverages the Bayesian methods
to compute the posterior distribution of the objective function, assisting in
the selection of hyperparameters for LSTM models. It iteratively refines
the model based on past evaluations to optimize performance. If the re-
sults are unsatisfactory, the hyperparameters and architecture are reassessed;
otherwise, successful outcomes guide future predictions using the optimized
weights. In our proposed model, Bayesian optimization plays a crucial role
in identifying key LSTM hyperparameter values (see Figure 2.4 and Algo-
rithm 1).

Data Testing set

Hyperparameters &
Training set architecture

Long Short-Term Memory

Bad
Bayesian optimization

Predictability

Evaluate model

Figure 2.4: Optimizing hyperparameters using the Bayesian method.

37
Algorithm 1 Bayesian Optimization for LSTM Hyperparameters
1: Input: Set of flight trajectories X , Total number of iterations N , Length
of trajectory points M , Objective function f , Acquisition function A.
2: Output: Θ (Set of observed pairs of hyperparameters and objective
function values).
3: Initialize Θ ← ∅; n ← 0
4: Define X = {Xi }
5: while n ≤ N do
6: x∗ ← argmaxx A(x | Θ)
7: y ← f (x∗ )
8: Θ ← Θ ∪ {(x∗ , y)}
9: Train the LSTM model using the updated Θ
10: Compute the weights W and biases b based on Equation (2.9) and
Equation (2.10).
11: n←n+1
12: end while
13: Return Θ

The selection function is the criteria by which the next set of hyperpa-
rameters are chosen from the surrogate function. The most common choice
of criteria is Expected Improvement (EI) [62]:
Z ∞
EI(x) = max(y − y ∗ , 0)p(y | x) dy, (2.11)
−∞
where:

• y ∗ is a threshold value of the objective function.


• x is the proposed set of hyperparameters.
• y is the actual value of the objective function using hyperparameters x.
• p(y | x) is the surrogate probability model expressing the probability
of y given x.

If p(y | x) is zero for all y < y ∗ , it indicates that the hyperparameters x


are unlikely to produce improvements. Conversely, if the integral is positive,
it suggests that the hyperparameters x are expected to perform better than
the threshold value.

38
2.5 Experimental results

2.5.1 Data preparation

Dataset

Automatic Dependent Surveillance-Broadcast (ADS-B) system is one of the


most popular tools used to collect the flight data. It utilizes satellite nav-
igation to determine and broadcast a flight position to air traffic control
and other nearby aircraft. This capability enhances situational awareness
by making flights visible to each other, thereby facilitating self-separation.
Each trajectory includes essential information such as timestamp, flight ID,
position (longitude, latitude, altitude), speed, heading, hour, etc.
In this thesis, our analysis uses the flight data collected from the Harts-
field–Jackson Atlanta International Airport (ATL) [107], with a sample row
of 4D trajectory data provided in Table 2.2.

Table 2.1: 4D Trajectory data features

Feature Unit Description


Timestamp Unix Unix timestamp indicating the time when data was
recorded.
Flight ID Icao24 Unique identifier (Icao24) representing each indi-
vidual flight.
Longitude Degree Geographic coordinate indicating the east-west po-
sition of the aircraft in degrees.
Latitude Degree Geographic coordinate indicating the north-south
position of the aircraft in degrees.
Altitude Feet Altitude above sea level measured in feet.
Speed Knot Speed of the aircraft relative to the ground, mea-
sured in knots (nautical miles per hour).
Heading Degree Direction of aircraft travel relative to true north,
expressed in degrees.
Hour Unix Unix timestamp indicating the specific hour when
data was recorded.

It is noteworthy that records with the same identifier correspond to

39
the same aircraft, and a collection of these records together forms the tra-
jectory Xi . In this research, we utilize the complete set of trajectory data
sourced from geographical coordinates associated with ATL airport. Each
trajectory is recorded at one-second intervals, providing detailed track points
for analysis.

Table 2.2: An example of 4D Flight trajectory information.

Feature Value
Timestamp 1537009474
Flight ID e8044e
Longitude -84.402633978396
Latitude 33.7542572021484
Altitude 12275.82
Speed 260.53122347432
Heading 12.1974800580645
Hour 1537009200

The dataset includes both static and dynamic information, with the
focus on dynamic data such as heading, speed, and 4D data (timestamp,
longitude, latitude, and altitude). Updates to this data occur every 5 seconds.
To address the continuity and smoothness of flight trajectories [76], missing
points are filled using the cubic spline interpolation, which offers a smooth
and accurate method for filling in gaps in the dataset (see Figure 2.5). To
manage the spatial range of the input data and enhance forecast accuracy
and smoothness, we employ a sliding window approach with a window size
of 100 seconds and a step size of 1 second. This method segments the data
into 105-second intervals, where the first 100 seconds are used as historical
data and the last 5 seconds serve as the prediction window.
Given a series of trajectory points P1 , P2 , P3 , . . . , Pn , with a time window
size of 100, the trajectory at time t can be expressed as
Xit = {Pit−99 , Pit−98 , Pit−97 , . . . , Pit }.
Each trajectory segment contains 100 consecutive trajectory points. The
sliding window is moved forward continuously to perform the segmentation
operation until the last trajectory point is reached. The segmented trajectory
is denoted by
X = {Xit , Xit+1 , Xit+2 , . . . , Xin },

40
(a) Before interpolation (b) After interpolation

Figure 2.5: An example of addressing missing data before (left) and after
(right) using cubic spline interpolation.

where t = 100, 101, . . . , n. The experimental process is shown in Figure 2.6.

Figure 2.6: Sliding time window.

This results in 21258 trajectories, which are then divided into training
and test sets, including 17006 and 4252 samples in training and testing sets,
respectively.

• Training set: LSTM model is trained using Bayesian optimization to


adjust parameters iteratively, minimizing prediction error, and lever-
aging probability distributions to represent uncertainty and enhance
effectiveness, particularly with outliers.

• Testing set: Trained LSTM model is evaluated on a testing dataset


to assess performance on unseen data. Predictions are compared with

41
ground truth values using metrics like RMSE and MAE to quantify
accuracy, helping gauge the model’s ability to generalize and providing
insights into overall performance.

Normalization

In this thesis, we use Z-Score normalization, features are transformed to


become standardized as follows:

X − min(X)
Xnew = , (2.12)
max(X) − min(X)
where:

• Xnew is the normalized value, obtained by scaling the original value X


to fit within a standardized range.
• min(X) and max(X) denote the minimum and the maximum values in
the dataset X, respectively.

2.5.2 Hyperparameter tuning


Hyperparameters are crucial for shaping the architecture of a machine learn-
ing model. Optimal hyperparameters can markedly enhance the model’s
performance. In our proposed model, we outline the dimensional adjust-
ments of both input and output data for every layer. Initially, the input
data comprises a 3D tensor with dimensions (None, 100, 6), where "None"
denotes the number of batch samples during model training. The detailed
information is described in Table 2.3.

2.5.3 Evaluation metrics


To evaluate model performance, we employ two common metrics: Root Mean
Square Error (RMSE) and Mean Absolute Error (MAE).

• RMSE is used to measure the magnitude of errors in the predictions. It


is calculated as the square root of the average of the squared differences

42
Table 2.3: Basic parameters of proposed model.

Layer Input size Output size Parameter


Convolutional (None, 100, 6) (None, 100, 32) 1x3 kernels, 32 filters
ReLU (None, 100, 32) (None, 100, 32) -
Max Pooling (None, 100, 32) (None, 50, 32) Window size: 2
Dropout (None, 50, 32) (None, 50, 32) Dropout rate: 0.3
LSTM (None, 50, 32) (None, 50) 50 units
Fully Connected (None, 50) (None, 4) 4 nodes

between the predicted and observed values, providing a measure of the


prediction error.
• MAE is used to evaluate the model’s performance by capturing the
average of the absolute differences between the predicted and observed
values across all instances in the test set, providing a measure of pre-
diction accuracy.

RMSE and MAE are respectively defined by:


v
u1 X
n 
u 2
RM SE = t Pb t i − Pit , (2.13)
n t=1
and
1X n
M AE = Pb t − Pit , (2.14)
n t=1 i
in which n represents the number of predicted points, while Pbit and Pit cor-
respond to the predicted and actual values at time step t of flight i.

2.5.4 Comparison of experimental results


To assess the effectiveness of our proposed BayesCoordLSTM method, we
evaluate its performance on the ATL dataset [107], as detailed in Section 2.5.1.
This evaluation comprises two primary comparative analyses:

(1)) Comparison with state-of-the-art methods. We compare the BayesCo-


ordLSTM with three state-of-the-art models: 3D-CNN [95], CNN-
GRU [106], and CG3D [107]. This comparative analysis highlights the

43
advancements and performance improvements of the BayesCoordLSTM
relative to these established alternatives.

(2) Impact of the Bayesian method and coordinate transformation. We as-


sess, through controlled experiments, three configurations of the CNN-
LSTM model: the baseline CNN-LSTM, CNN-LSTM with coordinate
transformation, and CNN-LSTM with both the Bayesian method and
coordinate transformation. This analysis provides how each enhance-
ment improves the accuracy and reliability of flight trajectory predic-
tions.

Comparison with state-of-the-art methods

We first compare the BayesCoordLSTM with the state-of-the-art models:


3D-CNN [95], CNN-GRU [106], and CG3D [107], using a system configured
with an Intel i7-8700 CPU, GTX 1080 Ti GPU, and 31 GB of RAM. Our
experimental results focus on the performance and training time of each
model illustrated in Table 2.4.

Table 2.4: Comparison with state-of-the-art methods.

Predicting model RMSE MAE Training time (s)


CNN-GRU [106] 0.3728 0.2164 11503
3D CNN [95] 0.2646 0.1785 10060
CG3D [107] 0.2626 0.1776 9042
Our proposed method 0.2154 0.1624 9450

Among the evaluated models, the CNN-GRU model demonstrated the


highest RMSE value of 0.3728, signifying a larger discrepancy between pre-
dicted and actual values. Similarly, its MAE of 0.2164 indicates relatively
higher prediction errors. The 3D CNN model showed improvement with
an RMSE of 0.2646 and MAE of 0.1785, indicating a more accurate pre-
dictions than the CNN-GRU. Notably, the CG3D model achieved an even
lower RMSE of 0.2626 and MAE of 0.1776, positioning it as one of the top-
performing models in terms of prediction accuracy. In our proposed model,
it can be attributed to two key factors: using coordinate transformation and
the Bayesian method. By implementing coordinate transformation, which
facilitates consistent data normalization, we effectively reduced the impact

44
of outliers. Furthermore, the incorporation of the Bayesian method helped
prevent overfitting by using probability distribution during training, thereby
improving the model’s generalization ability. This combination of techniques
enabled our proposed model to achieve more accurate predictions. As seen
in Table 2.4, our method achieved an RMSE of 0.2154 and MAE of 0.1624,
demonstrating its superior predictive accuracy compared to the other models.
Although the training time of the BayesCoordLSTM, at 9450 seconds,
is slightly longer than that of CG3D, at 9042 seconds, this marginal in-
crease is outweighed by the significant improvements in both RMSE and
MAE. Therefore, the BayesCoordLSTM not only achieves superior predic-
tive performance but also maintains competitive computational efficiency,
underscoring its overall effectiveness in the comparative analysis.

Impact of the Bayesian method and coordinate transformation

We then explore the individual and combined impact of integrating the Ba-
yesian method and coordinate transformation within the BayesCoordLSTM
framework. Through a series of controlled experiments, we evaluate three
configurations of the CNN-LSTM model: the baseline CNN-LSTM, CNN-
LSTM with coordinate transformation, and CNN-LSTM with both the Baye-
sian method and coordinate transformation. This analysis provides insights
into how each enhancement contributes to improving the accuracy and reli-
ability of flight trajectory predictions.
The prediction comparisons for a single aircraft, as shown Figure 2.7a,
Figure 2.7b, and Figure 2.7c, further illustrate these improvements. The
results demonstrate that the BayesCoord(CNN-LSTM) model consistently
produces the most accurate trajectory predictions, closely aligning with the
actual trajectory (see Figure 2.7d).
In Figure 2.7, the predicted and actual trajectories in 3D space are de-
picted. It can be observed that the predicted trajectories of the three models
exhibit similar trends to the actual trajectories, but notably, the CNN-LSTM
model’s predicted curve deviates significantly from the other two models.
The use of coordinate transformation in the Coord(CNN-LSTM) and the
proposed models helps reduce errors in predicting latitude, longitude, and
especially altitude. As shown in Figure 2.7d, the predicted trajectory by
the proposed model closely matches the actual trajectory with the small-

45
(a) CNN-LSTM. (b) Coord(CNN-LSTM).

(c) Our proposed model. (d) Three configurations.

Figure 2.7: A comparison of three configurations of the CNN-LSTM model.

46
est prediction error, followed by the Coord(CNN-LSTM) and CNN-LSTM
models.
Based on these results, the evaluation and comparison of the CNN-
LSTM, Coord(CNN-LSTM), and BayesCoord(CNN-LSTM) models in pre-
dicting trajectories in 3D space reveal significant differences in prediction ac-
curacy. The Coord(CNN-LSTM) model shows improvement in latitude and
longitude prediction compared to the baseline CNN-LSTM, highlighting the
effectiveness of coordinate transformation. Moreover, the BayesCoord(CNN-
LSTM) model achieves the best performance, producing trajectories closest
to reality with the smallest prediction errors among the evaluated models.

2.6 Conclusions

CNNs and LSTMs provide state-of-the-art performance in various tasks.


However, these models often face issues with overfitting on small datasets
and lack the capability to measure uncertainty, which negatively affects their
generalization abilities. Additionally, prediction tasks, especially those in-
volving time-series datasets, can encounter challenges due to complex long-
term fluctuations. Recently, the application of the Bayesian method in deep
learning has been introduced to estimate uncertainty in model predictions.
This approach is highly robust to overfitting and allows for uncertainty esti-
mation. In air traffic management, prediction is crucial for preventing traffic
jams and collisions. It typically involves predicting aspects such as delays,
flow, and trajectories. Therefore, improving performance in flight trajectory
prediction is of significant interest to many researchers.
In this chapter, we introduced a novel approach to enhance flight trajec-
tory prediction, named BayesCoordLSTM. This model efficiently integrates
Bayesian optimization and coordinate transformation into a hybrid CNN-
LSTM framework, achieving accurate flight path forecasts with improved
precision and efficiency. Our experiments, conducted using a real flight
trajectory dataset, clearly demonstrate the remarkable superiority of the
BayesCoordLSTM model over existing methods. Across various flight sce-
narios and conditions, this model consistently outperforms baseline models
in terms of trajectory prediction accuracy. The utilization of Bayesian opti-
mization not only streamlines hyperparameter tuning but also significantly

47
enhances predictive performance. While our study provides a solid founda-
tion for predicting flight trajectories, future research could explore dynamic
hyperparameter adaptation techniques. These techniques could involve real-
time adjustments to hyperparameters in response to evolving flight scenarios
or changing weather conditions, further enhancing the BayesCoordLSTM
model’s adaptability and real-time predictive capabilities.

48
Chapter 3

Reconstructing angular
probability density by using
Maximum Entropy
An important issue in air traffic management is the construction of traffic
complexity maps. Observed or simulated data provide a statistical sample
of trajectory angles. The underlying angular density is a crucial component
in assessing local complexity. Our approach involves applying the Maximum
Entropy principle, constrained by the statistical estimation of Fourier coef-
ficients for the desired density. Accounting for the statistical properties of
moment estimation is achieved by using the square of the Mahalanobis dis-
tance between statistical data and Fourier coefficients of the sought density
as the data fitting term. Partially finite convex programming is then imple-
mented to compute the optimal density. The cornerstone of this approach,
enabling a rigorous treatment of variational aspects of the chosen method, is
an infinite-dimensional version of Fenchel’s duality theorem, coupled with re-
sults on the conjugacy of integral functionals. We also discuss the numerical
aspects of our approach and present some simulations and reconstructions
from simulated data.

3.1 Introduction

In contemporary data analysis, the reconstruction of probability density func-


tions stands as a pivotal challenge across various domains, ranging from
physics to social sciences [25, 90]. The quest for robust methodologies capa-
ble of faithfully capturing underlying distributions from sparse data remains

49
relevant, particularly in complex systems such as those found in ATM [41,
47]. This chapter addresses this challenge by proposing a framework for re-
constructing angular probability density functions through the lens of the
Maximum Entropy principle.
The driving force behind our endeavor stems from the pressing need to
estimate the complexity of air traffic from flight trajectory data [63, 104].
With the exponential growth in air travel, understanding and managing the
intricate dynamics of air traffic flows has become paramount [36]. By har-
nessing the power of Maximum Entropy reconstruction, we aim to unveil
underlying patterns and structures within flight trajectory data, thereby
shedding light on the complexity of air traffic systems.
Our approach hinges on several key pillars, each meticulously designed
to navigate a specific aspect related to angular density reconstruction. First
and foremost, we formulate the reconstruction problem within the framework
of the Maximum Entropy principle. By leveraging this principle, we seek
solutions that strike a balance between fidelity to observed constraints and
maximization of entropy, thus ensuring maximal neutrality while remaining
consistent with available data.
A distinguishing feature of our methodology lies in the formulation of
constraints in terms of Fourier coefficients derived from statistical samples.
This not only allows us to encapsulate essential characteristics of the angular
distribution but also enhance the integration of empirical data into the recon-
struction process. Moreover, to account for the inherent variability within
the data, we introduce a fit term expressed via the squared Mahalanobis
distance, thereby, weighing the evidence of each Fourier coefficient according
to its uncertainty expressed through the covariance term.
Critically, we provide a rigorous treatment of the underlying optimiza-
tion problem, associated with the infinite-dimensional space. Drawing upon
concepts from partially finite convex programming and leveraging insights
into the conjugate of convex integral functionals, we devise a robust method-
ology for tackling this challenge. Importantly, our approach avoids the need
for discretization, thereby offering an efficient numerical solution for com-
puting Maximum Entropy solutions.

50
3.2 Literature review

The Maximum Entropy principle, deeply embedded in statistical mechan-


ics and information theory, originates from Ludwig Boltzmann’s pioneering
efforts in the late 19th century. This foundation work led to the entropy
concept, which Jaynes (1957) pivotalized in the context of probabilistic rea-
soning [60]. It is considered as a powerful tool for reconstructing probability
distributions from limited information. It seeks to maximize the entropy
of a probability distribution while satisfying given constraints derived from
observed data. Jaynes further elaborated on this concept, advocating for
the adoption of the probability distribution that maximizes entropy, consis-
tent with the available information [61]. Over the decades, its utility has
expanded in various domains from traditional physics to applications in bi-
ology, economics, and machine learning [25, 61].
The principle’s utility extends beyond its foundational domains, find-
ing significant application in fields such as image processing, pattern recog-
nition, and machine learning. Advancements in computational techniques
over time have increased its practical applicability, enabling its integration
into diverse scientific disciplines [26]. Today, it stands as a cornerstone of
probabilistic reasoning, providing a powerful framework for inference under
uncertainty [16]. Therefore, the Maximum Entropy principle continues to
inspire new methodologies and applications, ensuring its enduring relevance
in modern science.
In computational biology, Boomsma et al. (2014) effectively used Max-
imum Entropy to bridge computational models with experimental data, en-
hancing the reliability of biological simulations [17]. His research highlighted
that its ability can bridge gaps between computational models and experi-
mental data, indicating its broad applicability in sequence analysis, structural
modeling, and molecular simulations. Meanwhile, Wan et al. (2019) applied
the principle to refine data interpretation methods through Principal Com-
ponent Analysis, noting that the method’s increased computational demands
could limit its widespread application [120].
Furthermore, considering the inherent complexities and frequent errors
in real data, numerous researchers have demonstrated that their algorithms
can accurately reconstruct probability densities. A significant strength of
the Maximum Entropy principle is its capability to effectively manage data

51
uncertainty. Gomes et al. (2014) developed a methodology to estimate prob-
ability densities even in environments with noisy data, proving robust in sim-
ulations but needing further validation in real scenarios [51]. This capability
is particularly relevant in fields where data integrity may be compromised.
Lately, with the introduction of new methods, advancements in entropy
estimation techniques have played a vital role in enhancing the effectiveness
of the Maximum Entropy principle. In 2011, Behmardi et al. presented a
parametric entropy estimator that applied the Maximum Entropy principle
for approximating distributions without relying on local density estimation.
The estimator showed significant accuracy improvements over traditional
methods, especially in complex sensor network data scenarios, providing a
robust tool for various applications in signal processing and machine learn-
ing [13]. Although their parametric entropy estimator can avoid the pitfalls
of local density estimation, providing a more reliable tool for data analysis
in complex environments like sensor networks, it might not be as flexible
in handling diverse or non-standard data distributions without adjustments
or extensions to the model. This is where Lakshmanan and Pichler’s recent
study in 2023 introduced a novel approach to soft quantization using entropic
regularization, providing a promising alternative to traditional hard quanti-
zation methods that often lead to significant information loss. Their method,
grounded in robust theoretical formulations, demonstrates improved robust-
ness and signal fidelity across various data sets, especially in the presence
of noise and data variability. However, the practical implementation details
and computational efficiency of this approach are not extensively covered,
which could be crucial for real applications where computational resources
and processing time are limited [66].
Recent developments in this domain have shown varied applications and
improvements to the principle. From a theoretical perspective, one notable
application of the Maximum Entropy principle is in the reconstruction of an-
gular probability density functions (PDFs) from observed or simulated data.
In 2017, Gresele and Marsili explored both the philosophical and practical
aspects of Maximum Entropy principle, highlighting its applicability across
various scientific domains. However, the study only focused on theoretical
aspects only, lacking examples of direct empirical application [52].
A few years later, Amari et al. (2018) and Gzyl et al. (2021) have pro-
vided deeper insights into the mathematical underpinnings of the Maximum

52
Entropy principle, connecting it with significant statistical concepts like the
Wasserstein distance and Kullback-Leibler divergence [7, 53]. Due to lacking
of experimental results on simulation, these studies offer a comprehensive
understanding of the principle’s integration with broader probability the-
ory, though their complexity may pose challenges for those without a strong
mathematical background. Wan et al. in 2019 [120] utilized Maximum En-
tropy to enhance the accuracy of Principal Component Analysis, providing a
clear advantage in data interpretation, although the method’s increased com-
putational demand may limit its applicability for large or real-time datasets.
These studies collectively underscore the importance and potential of
Maximum Entropy in a range of applications, from enhancing computational
algorithms to addressing real data challenges. The principle’s adaptability
and robustness, demonstrated across different scenarios and data complexi-
ties, support its continued use and development for complex systems analysis,
including air traffic complexity estimation.
In air traffic management, the precise reconstruction of angular distribu-
tions is crucial for assessing traffic complexity and managing congestion [75].
The Maximum Entropy principle is particularly well-suited for this task,
offering a robust method for dealing with the inherent uncertainties and
variabilities of air traffic data. This principle allows us to derive the most
generalized form of a probability distribution by maximizing entropy, sub-
ject to the empirical constraints provided by observed data, such as angular
means or variances.
Prior research has explored various approaches to angular density recon-
struction, with a particular focus on incorporating the principles of Maximum
Entropy to enhance the fidelity of the reconstructed PDFs.
The motivation for employing the Maximum Entropy principle in an-
gular density reconstruction lies in its ability to provide a principled and
data-driven approach to probability distribution estimation. Unlike para-
metric methods that rely on specific functional forms, Maximum Entropy
reconstruction allows for greater flexibility and adaptability to the under-
lying data distribution. This flexibility is particularly advantageous in the
context of air traffic complexity estimation, where traffic patterns can exhibit
significant variability and non-linearity.
In simulation settings, the Maximum Entropy principle is highly effec-

53
tive for reconstructing angular distributions, especially in air traffic man-
agement. This method involves setting constraints based on available data,
such as average angles and variances, to develop the most unbiased probabil-
ity distributions possible. By maximizing entropy within these constraints,
the principle ensures that the resulting angular distributions in simulations
accurately capture the diverse and unpredictable nature of air traffic pat-
terns. This approach not only improves the realism of the simulations but
also provides deeper insights into the complexities of air traffic dynamics.

3.3 Setting
In this chapter, we address the complexity of air traffic by modeling airspace
as a two-dimensional (2D) image, segmented into a Cartesian grid where each
intersection point denotes the center of a defined zone, referred to as a cell.
For the sake of simplicity, we employ circular cells. For each cell we apply
the followings steps:

(1) Select the trajectories that pass through the cell and determine, for each
one, the angles corresponding to entry and exit points, with respect
to a given fixed direction. From such a statistical sample estimate,
via empirical means, the Fourier coefficients of the angular probability
distributions on the circle.

(2) Solve a maximum entropy problem to obtain probability distributions


that are as neutral as possible. From the mathematical viewpoint, this
step uses a Fenchel duality approach, that enables to solve some infinite
dimensional entropy problem via the resolution of a finite dimensional
concave smooth maximization problem.

(3) Use the optimal probabilities to estimate the local complexity.

Figure 3.1 gives an example to visualize a map of full trajectories and


the angular histogram of a selected cell. In Figure 3.1 (a), an airspace map
shows an array of intersecting flight trajectories within a vast grid, highlight-
ing the dense navigation challenges. A specific circular cell with numerous
trajectories is illustrated in Figure 3.1 (b). At this cell, the angular histogram

54
(a) A map of full trajec- (b) All trajectories cross- (c) Angular histogram of
tories. ing a selected cell. the selected cell.

Figure 3.1: An example map of trajectories through a selected cell and the
angular histogram of the selected cell.

of these paths is showed the diversity of movement through the concentrated


airspace as shown in Figure 3.1 (c).
Here we focus solely on the reconstruction of probability densities (the
first two steps). The actual application to the construction of complexity
maps is deferred to a subsequent chapter, currently being written. We now
specify the notations used and clarify our entropy model.

3.3.1 Computing moments


The first task is that of detecting trajectories that intersect each circular
cell S. We assume that all cells have the same radius δ. A cell can then be
written as
S = {(x, y) | (x − u1 )2 + (y − u2 )2 ≤ δ 2 }, (3.1)
in which u = (u1 , u2 ) is the coordinate-vector of its center.
Further, we denote by n the number of trajectories Tj that intersect the
cell, and the set of indices of these trajectories is J:

n = card J where J = {j ∈ N | Tj ∩ S ̸= ∅} . (3.2)

Each trajectory that intersects the cell enters with an angle (with respect
to a fixed direction) which we call heading and denote by θj . The complex

55
number eiθj is the corresponding point on the unit circle in C.
We now regard the θj ’s as realizations of a random angle variable θ.
We are then interested in estimating the probability density p(θ) of such a
variable. From the angular sampling θj , we may build a set of empirical
moments. The Fourier coefficients of p are defined as

1 Z 2π
al = p(θ) cos(lθ) dθ, l ∈ N, (3.3)
π 0

and
1 Z 2π
bl = p(θ) sin(lθ) dθ, l ∈ N∗ . (3.4)
π 0

The empirical coefficients

1 X 1 X
xl = cos(lθj ) and yl = sin(lθj ) (3.5)
πn j∈J πn j∈J

are regarded as statistical estimators of al and bl , respectively. Note in pass-


ing that the estimator x0 gives the exact value 1/π of a0 . Clearly, in prac-
tice, these estimators will be meaningful for values of l limited to some range
{1, . . . , N }. We are then led to consider the data vector

z = (x1 , y1 , . . . , xN , yN ) ∈ R2N .

The parameter N corresponds to the highest frequencies that are meaningful


and useful, depending on the number of trajectories intersecting the cell S.

3.3.2 Formulating the Maximum Entropy problem

We discuss the selection of a probability for angle. If we could dispose of


exact Fourier coefficients, we would then maximize the entropy of p under
the constraint that the moments of p match these exact values. We would

56
then aim at solving the following constrained optimization problem:
Min H(p)
s.t. p ∈ L1 ([0, 2π)),
Z 2π
1= p(θ) dθ,
(P◦ ) 0
1 Z 2π
xl = p(θ) cos(lθ) dθ, l ∈ {1, . . . , N },
π 0
1 Z 2π
yl = p(θ) sin(lθ) dθ, l ∈ {1, . . . , N }.
π 0
Here, H(p) denotes the Shannon neg-entropy of p:
Z 2π
H(p) = p(θ) ln p(θ) dθ.
0

However, we only dispose of approximate values of the Fourier coefficients,


except for l = 0 where the value is exact. It is customary to relax the inex-
act constraints, and we therefore aim at solving the following optimization
problem:
α
Minimize H(p) + ∥z − Ap∥2Σ−1
(P) Z 2π 2
s.t. 1 = p(θ) dθ,
0
in which α balances the intensity of relaxation. Here, we use the following
notation:

• ∥ · ∥Σ−1 denotes the function given by


q
∥z ∥Σ−1 =

⟨z′ , Σ−1 z′ ⟩,
in which Σ denotes the covariance matrix of random vector z. In this
optimization problem, xl and yl are considered as random variables
because they are computed from independent samples of the random
variable θ. As θ is random, so are xl and yl , which are the averages of
the cosine and sine functions evaluated at these angles, respectively.
• A : L1 ([0, 2π)) → R2N is the linear mapping defined by
Z 2π
Ap = p(θ)T(θ) dθ
0

in which T(θ) := (cos θ, sin θ, . . . , cos N θ, sin N θ).

57
Therefore, in Problem (P), the squared Mahalanobis distance between Ap
and z is penalized, as a model fitting requirement. Note that Σ is not known
in practice. It will be necessary to estimate it. Using the Mahalanobis dis-
tance requires, in principle, to dispose of a positive definite estimate of Σ.
However, when the estimated covariance matrix is singular, the inverse does
not exist, and then we have to resort to a degenerate version of the Maha-
lanobis distance, given by

( D E
z′ , Σ† z′ if z′ ∈ ran Σ,
∥z′ ∥2Σ† = (3.6)
∞ otherwise,

in which Σ† denotes the pseudo-inverse of Σ and ran Σ denotes the range of Σ.


The infinite value of the corresponding penalization is, of course, equivalent
to a sharp constraint in problem (P).
Problem (P) pertains to partially finite convex programming [19, 20].
In Section 3.4 below, we give a detailed account of the dual approach to
solving such problems. A remarkable aspect is that Fenchel duality will
enable us to solve the infinite dimensional constrained problem (P) via the
corresponding finite dimensional dual problem, which turns out to be convex,
smooth and unconstrained. It should be noticed that the dual treatment of
problem (P) will avoid having to compute the inverse (or pseudo-inverse)
of the covariance matrix. Problem (P) will be solved via the so-called dual
problem, which turns out to be much better behaved, as will appear in the
sequel.

3.4 Duality in action

Our aim in this subsection is to show how Fenchel duality can help us solve
our optimization problems via the tractable dual problem. We first review a
few basic definitions and results. The reader unfamiliar with convex analysis
may refer to [100, 134].

58
3.4.1 Review of useful convex analysis
Let L be any real vector space. A function f : L → [−∞, ∞] is said to be
convex if its epigraph, the set
n o
epi f := (x, α) ∈ L × R f (x) ≤ α ,
is a convex subset of L×R. It is said to be proper convex if it never takes the
value −∞ and it is not identically equal to ∞. A function g : L → [−∞, ∞]
is said to be concave if −g is convex, and proper concave if −g is proper
convex. Notice that g is concave if and only if its hypograph
n o
hypo g := (x, α) ∈ L × R g(x) ≥ α
is convex. The effective domain of a convex function f is the set
n o
dom f = x ∈ L f (x) < ∞ .
The effective domain of a concave function g is the set
n o
dom g = x ∈ L g(x) > −∞ .
The only functions that are both convex and concave are the affine functions
and the functions identically equal to ±∞. Clearly, the domain of each
affine function is equal to L, as a convex or as a concave function. The other
remaining functions would give rise to specific notation if they were not so
rare.
It is customary to use indicator functions to encode constraints in opti-
mization problems. Recall that the indicator function of a subset C ⊂ L is
the function
0 if x ∈ C,
(
δC (x) :=
∞ otherwise.

Now, let L and Λ be vector spaces paired (in the algebraic sense) by a
bilinear mapping
⟨·, ·⟩ : L × Λ −→ R
(x, ξ) 7−→ ⟨x, ξ⟩ .
An standard example is L = Rd = Λ with the usual Euclidean scalar product.
The convex conjugate of a function f (that is convex or not) is the function
n o
f ⋆ (ξ) := sup ⟨x, ξ⟩ − f (x) x ∈ X , ξ ∈ Λ.

59
The concave conjugate of a function f (that is concave or not) is the function
n o
f⋆ (ξ) := inf ⟨x, ξ⟩ − f (x) x ∈ X , ξ ∈ Λ.

It is remarkable that convex conjugacy acts as an involution on certain classes


of functions. For example, if f : Rd → [−∞, ∞] is a lower-semicontinuous
proper convex function, then

f ⋆⋆ := (f ⋆ )⋆ = f.

Given a convex subset C ∈ Rd , we call relative interior of C the interior of C


with respect to its affine hull aff C. Recall that aff C is the smallest affine
subspace that contains C. The relative interior of C is denoted by ri C. For
example, if C is a closed segment in R2 , its interior is empty while its relative
interior is the segment without its ends. It can be shown that the relative
interior of a nonempty convex set is always nonempty.

Theorem 1 (Fenchel). Let f and g be functions on Rd respectively proper


convex and proper concave such that

ri dom f ∩ ri dom g ̸= ∅. (3.7)

Then n o n o
η := infd f (x) − g(x) = sup g⋆ (ξ) − f ⋆ (ξ)
x∈R ξ∈Rd

and the supremum is attained.

The above theorem asserts equality between the optimal values of two
problems, together with attainment in the second one. It is customary to
call these underlying optimization problems the primal problem and the dual
problem, respectively. In the above theorem, both the primal and dual are
finite dimensional. Moreover, in order to tackle problems such as (P), we
must put some linear mapping in the picture. This is the purpose of the next
theorem.

Theorem 2. Let be given:

1. L and Λ, real vector spaces;

2. ⟨·, ·⟩, a bilinear form on L × Λ;

60
3. A : L → Rd , a linear mapping;

4. F : L → (−∞, ∞], a proper convex function;

5. g : Rd → [−∞, ∞), a proper concave function.


Assume that A admits a formal adjoint mapping A⋆ , that is, a linear mapping
A⋆ : Rd → Λ such that ⟨Ax, y⟩ = ⟨x, A⋆ y⟩ for every x ∈ L and every y ∈ Rd .
Then, under the qualification condition

(QC) ri(A dom F ) ∩ ri(dom g) ̸= ∅,

one has
n o n o
η := inf F (x) − g(Ax) = maxd g⋆ (λ) − F ⋆ (A⋆ λ) .
x∈L λ∈R

This theorem is the corner stone of what is referred to as partially finite


convex programming. Various forms appeared in the literature (see in partic-
ular [19, 20]). The selected form is as in [79], where no topological structure
on the infinite dimensional side is requested. The optimization problems

Minimize F − g ◦ A and Maximize g⋆ − F ⋆ ◦ A⋆

are respectively referred to as the primal and dual problems. The function
D := g⋆ − F ⋆ ◦ A⋆ appearing in the dual problem is referred to as the dual
function. The theorem asserts the equality between the optimal values of the
primal and dual problems, together with dual attainment. The next result
will provide conditions that will guarantee primal attainment as well.
Theorem 3 (Primal attainment). With the notation and assumptions of the
previous theorem, assume in addition that
(QC ⋆ ) ri dom g⋆ ∩ ri dom(F ⋆ ◦ A⋆ ) ̸= ∅.

Suppose further that

(a) F ⋆⋆ = F and g⋆⋆ = g;


(b) there exists λ̄ dual optimal and x̄ ∈ ∂F ⋆ (A⋆ λ̄) such that F ⋆ ◦ A⋆ has
gradient Ax̄ at λ̄.

Then x̄ is primal optimal.

61
The latter result provides not only a condition for primal attainment,
but it also makes appear as a watermark the possibility of a link between
primal and dual solutions. The bi-conjugate relationships in Assumption (a)
are central in the theorem, and the difficulty in our problem is to prove that
the entropy satisfies this property. It turns out that, in our context, it is
possible to compute the conjugate of the Shannon entropy by conjugating
through the integral sign.
Generally speaking, an integral functional is a functional of the form
Z
H(u) = h(u(ω), ω) dµ(ω), u ∈ L.

Here, Ω is assumed to be endowed with a σ-algebra of measurable sets and


with a measure denoted by µ; the function h is called the integrand, and the
argument u is assumed to pertain to some space of measurable functions L.
In our context, it is enough to consider such a functional on the familiar
space L = L1 ([0, 2π)), implicitly endowed with the Borel σ-algebra and the
Lebesgue measure. Moreover, dependence of h in its second argument is not
vital here, and we are only interested, in this chapter, in the case where
h(u(ω), ω) = h◦ (u(ω)), with
t ln t if t > 0,



h◦ (t) = 0 if t = 0,
∞ t < 0.

Clearly, h◦ is a lower semi-continuous convex proper function and, as such,


◦ = h◦ . Conjugating H is elegantly performed by conjugating the
satisfies h⋆⋆
integrand, as we shall see now. Following Rockafellar, we say that:

A space L of measurable functions is decomposable if it is stable


under bounded alteration on sets of finite measure.

Otherwise expressed, L is decomposable if and only if it contains all functions


of the form
1T u◦ + 1T c u,
in which T has finite measure, u◦ is a measurable function such that the
set u◦ (T ) is bounded, and u is any member of L. Here, 1T denotes the
characteristic function of T : 1T (x) equals 1 if x ∈ T and 1T (x) equals zero
otherwise; and T c denotes the complement of T . It is easy to see that Lp -
spaces are decomposable, which includes our workspace L1 ([0, 2π)).

62
Theorem 4 (Rockafellar). Let L and Λ be spaces of measurable functions
on Ω paired by means of the standard integral bilinear form
Z
⟨f, φ⟩ = f (ω)φ(ω) dω.

Let H be the functional of integrand h◦ , that is,


Z
H(u) = h◦ (u(ω)) dω,

with h◦ proper convex and lower semi-continuous. Assume that L is decom-


posable and that H has nonempty effective domain. Then
Z
H (φ) =

h⋆ (φ(ω)) dω

for every φ ∈ Λ, and H ⋆ is convex on Λ.

Applying the latter theorem with L = L1 ([0, 2π)), Λ = L∞ ([0, 2π))


and h◦ the above defined integrand of the Shannon entropy we see that, in
our case, Z 2π
H ⋆ (φ) = h⋆◦ (φ(ω)) dω.
0
As can be seen from the appendix, the function h⋆◦ is given by

h⋆◦ (τ ) = exp(τ − 1), τ ∈ R.

Finally, since Λ = L∞ ([0, 2π)) is also decomposable, we obtain that


Z 2π Z 2π
H (u) =
⋆⋆
◦ (u(ω)) dω
h⋆⋆ = h◦ (u(ω)) dω = H(u).
0 0

We conclude that our entropy satisfies the bi-conjugacy relationship re-


quested in Theorem 3.
Before returning to our specific problem, let us state one more theo-
rem, in which an explicit relationship between primal and dual solutions is
obtained.

Theorem 5 (Primal-dual relationship). With the notation and assumptions


of Theorem 2 assume in addition that dom D has nonempty interior, that H is
an integral functional of integrand h such that conjugacy through the integral

63
sign is permitted. Assume that, as in Theorem 3, H ⋆⋆ = H and g⋆⋆ = g.
Assume finally that the conjugate integrand h⋆ is differentiable over R, and
that there exists some dual-optimal vector λ̄ in int dom D. If
 
ū(ω) := h⋆ ′ [A⋆ λ̄](ω), ω ∈ L,

then ū is a primal solution.

We are now ready to get back to our specific entropy problem.

3.4.2 Computing Maximum Entropy densities


On denoting by I the linear mapping given by
Z 2π
Ip = p(θ) dθ, p ∈ L1 ([0, 2π)),
0

Problem (P) can be written as

Minimize H(p) − g◦ (A◦ p)

over the space L1 ([0, 2π)), in which

A◦ p = (Ip; Ap) ∈ R × R2N = R1+2N

and g◦ is the function on R1+2N given by


α
g◦ (η◦ , η) = − ∥z − η∥2Σ−1 − δ{1} (η◦ ).
2
Clearly, the use of the indicator function δ{1} (·) enables to encode the (unre-
laxed) normality constraint Ip = 1. Straightforward computations show that
the adjoint mapping A⋆◦ : R1+2N → L∞ ([0, 2π)) is given by

A⋆◦ (λ◦ , λ)(θ) = λ◦ + ⟨λ, T(θ)⟩ .

Moreover,
1
(g◦ )⋆ (λ◦ , λ) = λ◦ + ⟨z, λ⟩ − ∥λ∥2Σ .

64
Accounting for the fact that, as we have seen above, the entropy can be
conjugated by conjugating through the integral sign, the dual problem reads:

Maximize D(λ◦ , λ) := λ◦ + ⟨λ, z⟩


1 Z 2π
− ∥λ∥Σ − exp(λ◦ − 1)
2
exp ⟨λ, T(θ)⟩ dθ.
2α 0

The function D is concave, finite and differentiable on R1+2N . Its stationary


points must satisfy the system
 Z 2π D E


 0 = 1 − exp(λ̄◦ − 1) exp λ̄, T(θ) dθ,
0
1 Z 2π D E
 0 = z − Σλ̄ − exp(λ̄◦ − 1) T(θ) exp λ̄, T(θ) dθ,


α 0

which reduces to
Z 2π D E
1 T(θ) exp λ̄, T(θ) dθ
0=z− Σλ̄ − 0
Z 2π .
α D E
exp λ̄, T(θ) dθ
0

It should be noticed that the above system is also the optimality system of
the problem

1 Z 2π
Maximize ⟨λ, z⟩ − ∥λ∥2Σ − ln exp ⟨λ, T(θ)⟩ dθ
(D̃) 2α 0

s.t. λ ∈ R2N .

Proposition 6. The function

1 Z 2π
D̃(λ) := ⟨λ, z⟩ − ∥λ∥2Σ − ln exp ⟨λ, T(θ)⟩ dθ
2α 0

to be maximized in Problem (D̃) is concave, smooth and everywhere finite.


Its gradient is given by
Z 2π
1 T(θ) exp ⟨λ, T(θ)⟩ dθ
∇D̃(λ) = z − Σλ − 0
Z 2π .
α
exp ⟨λ, T(θ)⟩ dθ
0

65
The function h⋆◦ (τ ) = exp(τ − 1) obviously meets the requirements of
Theorem 5. Provided we can obtain a dual solution (λ̄◦ , λ̄), the optimal
density is then given by

D E
h D Ei exp λ̄, T(θ)
p̄(θ) = exp λ̄◦ − 1 + λ̄, T(θ) =Z 2π D E , (3.8)
exp λ̄, T(θ) dθ
0

where λ̄ maximizes the function D̃.

Remark 7. The primal dual relationship obtained above shows that the
solution to our Maximum Entropy problem exists and belongs to the expo-
nential family. The particular form in Equation (3.8) is well known. In the
unrelaxed form of Problem (P), it can be obtained using classical inequal-
ities (see e.g. [64], cited in [49]). It is worth noticing that, in [50], similar
results are obtained without resorting to Fenchel duality. However, we wish
to stress some advantages one may have to resort to Partially Finite Convex
Programming. First, we notice that the latter formalism can deal in a very
elegant way with problems in which constraints are relaxed. More impor-
tantly, by using our approach, we learn that the appropriate value of the
Lagrange multiplier maximizes the well behaved dual function, which offers
a powerful way to derive a computationally efficient algorithm. Finally, we
see that the proposed framework may be used with a variety of entropy func-
tional (such as Burg [23], Renyi [97], Tsallis [117, 80], etc.). Although this
goes beyond the scope of this chapter, it certainly deserves to be mentioned.

3.4.3 Deriving an algorithm

Main computational aspects

The above developments yield the following algorithm for the computation
of Maximum Entropy densities.

66
Algorithm 2 Computing Maximum Entropy densities
1: Input: N ∈ N∗ , z ∈ R2N , α > 0.
2: Output: The Maximum Entropy probability angular distributions
3: Maximize the dual function
1 Z 2π
D̃(λ) := ⟨λ, z⟩ − ∥λ∥2Σ − ln exp ⟨λ, T(θ)⟩ dθ.
2α 0

4: From the dual optimal solution λ̄ obtained in the previous step, form the
optimal density D E
exp λ̄, T(θ)
p̄(θ) = Z 2π D E .
exp λ̄, T(θ) dθ
0

Maximizing the above dual functions is easy, since D̃ is concave, smooth


and R-valued. Such maximization can be efficiently performed by means of
some quasi-Newton method such as the Broyden-Fletcher-Goldfarb-Shanno
(BFGS) algorithm [22], which requires the evaluation of D̃ and its gradient
only.

How to choose N and α?

In the Maximum Entropy problem, the attach term


α
∥z − Ap∥2Σ−1
2
depends on the choice of

• the highest frequency N to be accounted for;


• the estimated covariance matrix Σ;
• the relaxation parameter α.

We now describe our strategy for their empirical determination.


It is intuitively clear that the information provided by higher frequencies
tend to be irrelevant: in the spirit of the Maximum Entropy strategy, we

67
are looking for the most unbiased information on the underlying random
variable (the angle θ) so that fine (high resolution) details are unwanted;
moreover, the empirical Fourier coefficients are expected to become poorer
approximations of the true ones as the frequency l increases, in particular
when the number of angular samples is low. It is therefore only natural to put
a limit N on the highest frequency to be accounted for. The dimension of the
dual variable will then be 2N . In the context of simulated random angles, we
observe that the Kullback-Leibler entropy of the true density p◦ relative to
the reconstructed density p decreases as N increases, and stabilizes beyond
some value of N . Otherwise expressed, beyond a certain value of N , the
gain in information provided by additional Fourier coefficients is negligible
or non-existent.
For example, Figure 3.2 illustrates that there is no gain beyond N = 50
in our simulation, indicating that higher frequencies do not substantially
improve the approximation of the true distribution. This finding supports the
principle of using a limited number of Fourier coefficients to avoid overfitting
and unnecessary computational complexity in the estimation process.

Figure 3.2: Kullback–Leibler relative entropy versus highest frequency.

It now remains to select the value of the relaxation parameter α. In


the literature on inverse problems, one can find a variety of parameter se-

68
lection rules. Among these rules, the Morozov discrepancy principle states
that α should be such that the corresponding solution p̄ should give a residual
∥z − Ap̄∥ equal to a number strictly greater than, but close to, the estimated
size ρ of the error on the data. Therefore, in order to implement the Moro-
zov principle, a reasonable upper bound on the error should be known, an
assumption that we make in the context of this chapter.
Our strategy is as follows: starting from an initial guess α◦ > 0, we
compute the corresponding Maximum Entropy solution, which we now de-
note by p̄α◦ . We then compute the residual ∥z − Ap̄α◦ ∥. If the latter residual
is above the value 1.105ρ, then set α1 > α◦ ; otherwise set α1 < α◦ . We
then iterate on this process, until α gives a residual approximately equal to
1.105ρ. In practice, a dichotomy strategy can be used so as to converge to the
appropriate value of α. The detailed information can be seen in Algorithm 3.

Algorithm 3 Determining α by means of the Morozov discrepancy principle


1: Input: ρ, condition (e.g. 1.095ρ ≤ residual ≤ 1.105ρ), µ (e.g. µ =1.2),
maximum number of iterations Nmax
2: Output: Morozov value of α
3: Set i = 0, αmin = 0, αmax = ∞, α0 = 1
4: while condition is not satisfied and i < Nmax do
5: Compute the Maximum Entropy solution p̄αi
6: Compute the residual ∥z − Ap̄αi ∥
7: if residual < 1.095ρ then
8: Set αmax = αi
9: Set αi+1 = αmin +α
2
max

10: end if
11: if residual > 1.105ρ then
12: Set αmin = (
αi
αmin +αmax
if αmax < ∞
13: Set αi+1 = 2
µαi otherwise
14: end if
15: Set i = i +1
16: end while
17: Return last value of α

69
3.5 Simulations

3.5.1 Validation with Dirac distributions


We conduct testing on a single cell to assess the fundamental functioning of
the model. The results from this step provide essential insights into accuracy
and basic performance.
We start by performing simulations. We shall:

(a) generate an angular sample following a probability p◦ ;


(b) compute the corresponding empirical Fourier coefficients;
(c) compute the Maximum Entropy density that is compatible, in the re-
laxed setting described above, with our empirical Fourier coefficients.

This simulation will allow us to get an idea of the sample size that is necessary
for obtaining relevant information, merely by comparing the true density with
the one obtained through the maximization of entropy.
We start with the design of densities on [0, 2π). To this end, we recall
the definition of the bump function:
1
  
 exp − if θ ∈ (−1, 1),
ψ◦ (θ) =  1 − θ2
0 otherwise.
Recall that ψ◦ is a C ∞ -function, whose support is indeed the compact interval
[−1, 1]. We can emulate the behaviour of a Dirac distribution by letting
1
!
θ Z
ψβ (θ) := ψ◦ in which c := ψ◦ (θ) dθ.
βc β
Here, β is a strictly positive parameter. The closer β to zero, the more ψβ
is concentrated about the origin. It is readily seen that the integral of ψβ is
equal to 1 whatever may be the value of β, and that the support of ψβ is the
interval [−β, β].
We may now define, for the purpose of simulation, densities of the form
p
p◦ (θ) = ck ψβk (θ − θk ), (3.9)
X

k=1

70
in which the ck ’s are positive numbers that sum up to 1. The translation
parameters θk are chosen in such a way that the support of p◦ is contained
in [0, 2π]. We then simulate angular samples that follow such a distribution.
This simulation will allow us to get an idea of the sample size that is
necessary for obtaining relevant information, merely by comparing the true
density with the one obtained through the maximization of entropy.
To visually demonstrate the effectiveness of our simulation in capturing
the original distribution’s characteristics, we proceed with a specific example.
Figure 3.3 is one of a simulation example based on the original prob-
ability p◦ (θ) with 2 peaks and β = 0.1. It shows the underlying (original)
probability density p◦ (θ) and the histogram obtained by simulation.

Figure 3.3: Simulate angular samples based on a original probability p◦ .

71
Following the successful validation of the simulation, we proceed to refine
our model to enhance its performance further.

Figure 3.4: Gradient norm when N = 40.

Figure 3.4 illustrates the convergence behavior of an optimization pro-


cess with the gradient norm plotted against the iteration number. Initially,
there is a sharp decrease from the first to the second iteration, indicating
a rapid improvement in the optimization process. Following this, the curve
shows a rapid convergence of the gradient norm to zero.

3.5.2 Numerical implementations

Morozov principle for the selection of α

In this section, we investigate the impact of the regularization parameter α


on the selection of the optimal solution in the inverse problem.
We have conducted an algorithmic search to choose the appropriate value
for α, using the Morozov principle as described in Algorithm 3. The results
of the algorithm reveal the relationship between α and the residual as shown
in Figure 3.5.

72
Figure 3.5: Residual values when α values from 1 to 10000.

As α increases from 1 to 100, the residual gradually decreases from ap-


proximately 0.771 to about 0.711. However, once α surpasses the value of 100,
further increases do not significantly reduce the residual. This indicates that
selecting an appropriate value for α is vital to achieve the desired residual.
The findings from this analysis are crucial for understanding how the
value of α impacts the algorithm’s performance in real scenarios, suggesting
that value of α around 100 is likely optimal for minimizing the residuals
without unnecessary computational expenditure.

Reconstructions

After determining the optimal parameter values, we compute the optimal


density by restructuring the original distribution. Figure 3.6 and Figure 3.7
present a comparative analysis between the original and optimized proba-
bility density functions, reflecting the outcome of an optimization process
aimed at reconstructing the original distribution.
In the two-peak scenario (as shown in Figure 3.6), the optimal density
closely aligns with the original density, demonstrating the algorithm’s suc-
cess in capturing the essential features of the distribution, namely the peak
locations and heights. It indicates that the optimization process is robust in

73
simpler scenarios with fewer peaks.
In the more complex five-peak scenario (as shown in Figure 3.7), the
reconstructed density still captures the general shape and peak positions of
the original density, which is commendable given the increased complexity.
In all situations, the reconstructed density closely mirrors the shape of
the original density successfully capturing the positions and general form of
the peaks. However, there are minor differences, particularly in the central
peak where the optimal density does not perfectly align, hinting at slight
deviations. Despite these small deviations, the optimized distribution is a
close approximation of the original distribution.

Figure 3.6: Results on the Original and Optimal densities with two peaks.

The slight deviations in peak heights and widths provide avenues for re-
finement, but overall, the optimization algorithm appears to be performing
effectively. It is successful in identifying and adapting to the multiple modes
of the distribution, which is a positive outcome, a particularly positive devel-
opment within the challenging multi-peak environment. Further fine-tuning
of the algorithm might lead to even closer alignment, enhancing the model’s
precision.
The effectiveness of the BFGS algorithm in maximizing concave, smooth,
and R-valued dual functions is further explored by comparing its performance
with several other optimization methods. These include Powell [92], Nelder-

74
Figure 3.7: Results on the Original and Optimal densities with five peaks.

Mead [85], Conjugate Gradient (CG) [54], Truncated Newton Conjugate-


Gradient (TNC) [43], Constrained Optimization BY Linear Approximations
(COBYLA) [91], and Trust Region Constraint (Trust-Constr) [84].
Some basic information of these approaches are presented as follows:

1. BFGS method. It is a quasi-Newton method used to solve uncon-


strained optimization problems by approximating the Hessian matrix
of the objective function. Instead of directly computing the Hessian
matrix, BFGS uses gradient-based updates to build the inverse Hes-
sian matrix approximation. The update formula for the inverse Hessian
matrix Bk+1 at the (k + 1)th iteration is:
Bk sk sTk Bk yk ykT
Bk+1 = Bk − T + T
sk Bk sk yk sk
where:
• sk = xk+1 − xk is the difference between successive points.
• yk = ∇f (xk+1 ) − ∇f (xk ) is the change of gradients.
Next step, BFGS computes xk+1 by using Bk+1 :
xk+1 = xk − αk Bk−1 ∇f (xk ) .

75
2. Powell method. It is a derivative-free optimization method using a
series of linear searches to find the minimum value of a function. The
optimization process begins with a set of initial directions (usually the
unit vectors of the coordinate axes). At each iteration, the method
performs a linear search along each of these directions to reduce the
value of the objective function f (x).
Given a set of directions {d1 , d2 , . . . , dn }, this method finds the optimal
point xk+1 by performing a minimization along each direction di as:

xk+1 = xk + λi di ,

where λi is a scalar chosen to minimize f (x) along the direction di . Af-


ter each line search sequence, the method updates the optimal direction
by adding the direction from the starting point to the ending point of
the search sequence and replacing the direction that caused the least
reduction in the objective function. The new direction is computed as:

dnew = xn − x0 .

3. Nelder-Mead method. It is a derivative-free optimization method that


keep track of n + 1 interest points in Rn , whose convex hull forms a
simplex to find the minimum of the objective function f (x). Given a
simplex S with vertices {x1 , x2 , . . . , xn+1 }, we compute f (x) at each
vertex of the simple and determine xl , xh , xs corresponding to the best,
worst and second-worst vertices based on the values of f (x). Oper-
ations on the simplex include reflection, expansion, contraction, and
shrinkage.

• Reflection: After computing the centroid xc of the simplex, move


the worst point xh through xc to generate a new point xr :

xr = xc + α(xc − xh )

where α > 0 is the reflection coefficient.


• Expansion: If reflection reduces the function value, try expand-
ing further to find a better point xe :

xe = xc + γ(xr − xc )

with γ > 1.

76
• Contraction: If reflection fails, try contracting the point xr to-
wards xc :
xc = xc + ρ(xh − xc )
with 0 < ρ < 1.
• Shrinkage: If contraction also fails, shrink the entire simplex
towards the best point xl :

xi = xl + σ(xi − xl )

with 0 < σ < 1.

4. CG method. It is a powerful optimization technique, particularly effec-


tive for large-scale problems where the objective function is a quadratic
function. At each iteration k, the method finds a conjugate direction
dk by combining the current gradient with the previous direction:

dk = −∇f (xk ) + βk−1 dk−1

where the scalar βk−1 is computed as:

∇f (xk )T ∇f (xk )
βk−1 =
∇f (xk−1 )T ∇f (xk−1 )

The next point xk+1 is updated as:

xk+1 = xk + αk dk

where αk is the optimal step size, usually determined by a line search.

5. TNC method. It is a hybrid method that combines the conjugate gradi-


ent method with the Newton method, particularly useful for large-scale
optimization problems. Instead of using the full Hessian matrix, TNC
only uses part of it, reducing computational complexity. The Newton
step pk is computed by solving:

Bk pk = −∇f (xk )

where Bk is the Hessian matrix at point xk .

77
In TNC, pk is approximated by truncating the iterations of the conju-
gate gradient method, finding a near-optimal solution. The next point
is updated as:
xk+1 = xk + αk pk
where αk is the optimal step size.

6. COBYLA method. It is a derivative-free optimization method that


handles constraints by approximating both the objective function and
the constraints using linear models. At each step, a linearized sub-
problem within a trust region is solved to find the optimal solution
while ensuring that the constraints are satisfied. Suppose the original
objective function is f (x) and the constraints are gi (x) ≤ 0. COBYLA
approximates them with linear functions:

fˆ(x) = f (x0 ) + ∇f (x0 )T (x − x0 ),

gˆi (x) = gi (x0 ) + ∇gi (x0 )T (x − x0 ).


The optimization is performed on these linearized functions, and the
result is used to update the current point.

7. Trust-Constr method. It optimizes a function within a trust region


around the current point, using a local model of the objective function.
Given a local model mk (p) and a trust-region radius ∆k , the subproblem
to solve is:
min
p
mk (p) subject to ∥p∥ ≤ ∆k ..

The model mk (p) is typically a quadratic approximation of the objective


function:
1
mk (p) = f (xk ) + ∇f (xk )T p + pT Bk p
2
where Bk is an approximation of the Hessian matrix. If pk is the solu-
tion to the subproblem, the new point is updated as:

xk+1 = xk + pk .

The trust-region radius ∆k may be adjusted based on the accuracy of


the model.

78
Figure 3.8: Comparison of computational time by seven optimal models with
five peaks.

Each of these methods, like BFGS, primarily relies on evaluations of


the function and its gradient to find optimal solutions. The assessment of
these methods focuses on computational time (see Figure 3.8). In this figure,
COBYLA and BFGS methods are noted for their rapid execution times, mak-
ing them preferable for applications requiring quick data processing. In con-
trast, Powell and TNC methods require longer computational times, which
could be a disadvantage in time-sensitive applications.

3.6 Towards air traffic complexity estimation


The application of the Maximum Entropy method offers a novel avenue for
modeling and visualizing the complexities inherent in air traffic management.
By reconstructing angular probability densities from simulated trajectory
data, this method allows for a nuanced analysis of traffic dynamics, par-
ticularly in terms of predictability and potential trajectory conflicts within
specific airspace sectors.

79
Various measures of complexity may be envisaged from the obtained
probability density. In the context of aircraft trajectories, complexity mea-
sures are used to understand and quantify the level of predictability and
potential conflicts in these trajectories. It can be mentioned by some key
points:

• Divergence between entrance and exit density: This concept suggests


that by comparing the angular density at the entrance of a specific
region (cell S) with the angular density at the exit, we can determine
if aircraft tend to change their paths within that region. If there is a
significant difference between the two densities, it indicates that aircraft
are altering their trajectories in that region. This divergence signifies
a lack of predictability and contributes to the complexity of the overall
air traffic system.

• Counting peaks in entrance density: Another method for assessing com-


plexity is to count the number of peaks in the entrance density. Each
peak may represent a concentration of aircraft in a specific direction
or route. If there are multiple peaks, it suggests that different aircraft
are following distinct paths when entering the region. A single peak
indicates that most aircraft tend to follow the same direction, which
results in lower conflict and, consequently, lower complexity.

• Weighting complexity by traffic intensity: It is important to consider


the intensity of air traffic when evaluating complexity. The number of
trajectories (n) within the specific region (cell S) is used as a measure
of traffic intensity. Complexity should be weighed or adjusted based
on the volume of traffic. A high level of complexity might be more
concerning if it occurs in regions with heavy air traffic compared to
regions with low traffic.

In this context, the goal is to use these measures to understand the


patterns and behaviors of aircraft trajectories. Higher complexity, indicated
by divergence, multiple peaks, or significant traffic, could imply a need for
more sophisticated air traffic management strategies to ensure safety and
efficiency. These measures help in assessing and optimizing air traffic control
systems.

80
3.7 Conclusions

In this chapter, we envisage a novel approach to construct traffic complexity


maps for air traffic management by focusing on the reconstruction of angu-
lar probability density functions from simulated trajectory data. The core
of our methodology is to use the Maximum Entropy principle, rigorously
incorporating the statistical properties of the Fourier coefficients of the un-
derlying angular density. This approach ensures that the local complexity is
accurately assessed.
Our application of partially finite convex programming, grounded in an
infinite-dimensional form of Fenchel’s duality theorem and the conjugacy of
integral functionals, effectively computes optimal densities. The implemen-
tation of this method presents a robust solution to the data fitting challenge
within the framework of angular density estimation.
The validation of our method through numerical simulations and the re-
construction of simulated angular data have demonstrated its proficiency in
precisely capturing the original angular probability densities. From two-peak
distributions to more complex five-peak scenarios, our algorithm consistently
achieved an optimal density that closely resembles the original distribution,
effectively preserving the defining characteristics of the distribution — such
as peak locations and heights — even as the complexity increases. This
consistency ensures that our method can provide reliable information for
air traffic complexity maps, enhancing the predictive capabilities and opera-
tional efficiency of air traffic management systems.
The increasing number of peaks, correlating with the number of aircraft
trajectory directions within a sector, further validates our method’s capabil-
ity to significantly improve the evaluation of air traffic complexity. This is
instrumental for real-time decision-making and strategic planning in air traf-
fic control, where understanding complex flight patterns is crucial for safety
and efficiency.
The potential applications of this methodology in real air traffic man-
agement are promising. By integrating real traffic data into this framework,
future research could refine and validate the models to enhance their practi-
cal utility. Additionally, expanding this approach to incorporate other vari-
ables such as altitude and speed could provide a more comprehensive tool

81
for complexity assessment, supporting more sophisticated air traffic control
strategies.
This research not only establishes a robust mathematical framework for
estimating angular density but also paves the way for significant advance-
ments in managing the complexities of growing air traffic. As the volume
of air traffic continues to increase, the relevance and necessity of deploying
advanced analytical tools such as those developed in this study will become
ever more critical.

82
Chapter 4

Building complexity maps of


airspace via information
geometry
Building air traffic complexity maps is crucial for enhancing air traffic man-
agement and ensuring safety. By understanding the local complexity, air
traffic controllers can better manage aircraft flow, reduce congestion, and
anticipate potential conflicts, ultimately leading to more efficient and safer
airspace operations. In this chapter, we introduce a methodology to assess
air traffic complexity through three steps. First, we estimate the probability
density of aircraft headings on a Cartesian grid using Maximum Entropy.
Next, we approximate the geodesic distance to the uniform distribution in
the Maksimov manifold. Finally, we compute a complexity index combin-
ing angular distribution and local traffic intensity to create a complexity
map. This index, which has been tested on real data, demonstrates reliabil-
ity, with alternative criteria possible by comparing heading distributions at
cell boundaries. Our approach is both numerically efficient and effective in
capturing air traffic complexity.

4.1 Introduction

4.1.1 Motivation
Air traffic complexity involves the dynamic interactions and trajectories of
aircraft within the airspace. This chapter presents a novel methodology for
constructing complexity maps that strategically anticipate and manage air

83
traffic challenges. By developing advanced complexity measures, we aim
to enhance ATM systems and ensure efficient and safe operations through
pre-emptive planning.
Current airspace management faces significant constraints due to the
rigid structures and the cognitive limitations of ATCOs. They are tasked
with safely managing an increasing number of aircraft, especially within con-
gested sectors. Despite technological advancements in ATM, the combination
of fixed airspace configurations and controller workload constraints lead to
inefficiencies and potential safety risks. As air traffic continues to grow,
these challenges become more pronounced, necessitating innovative solutions
to maintain and improve safety and efficiency.
Proactive development of complexity measures and maps is essential for
effective ATM. Complexity maps provide valuable insights into traffic pat-
terns and system responses to various conditions, facilitating better strategic
planning. Lee et al. (2009) [68] demonstrated how complexity maps can op-
timize aircraft entry points and manage sector boundaries, while Hong et al.
(2014) [57] proposed a Variance Quadtree-based Map for resolving conflicts
by identifying optimal strategies for aircraft entries into sectors.
Further, studies by Juntama et al. (2020) [63] and Delahaye et al.
(2022) [36] showed that complexity maps are instrumental in optimizing 4D
trajectories and evaluating local traffic disorder. These tools are crucial
for enhancing air traffic control operations, reducing delays, and improving
overall airspace capacity by anticipating and managing complexity before it
arises. By leveraging complexity maps, ATM systems can better handle the
intricacies of air traffic, ensuring that safety and efficiency are maintained
even as traffic volumes increase.

4.1.2 Overview

Our methodology for constructing air traffic complexity maps begins with
discretizing the airspace using a Cartesian grid. At each grid point, we define
a circular cell with a radius compliant with safety regulations. For each cell,
we compute an angular distribution of trajectories using the principle of
Maximum Entropy.
This distribution thus obtained belongs to the family of Generalized von

84
Mises (GvM) distributions. The determination of the probability density
of the underlying angular variable is achieved using Fenchel duality, where
empirical moment constraints are relaxed in accordance with the estimated
covariance matrix of the data, implying the use of Mahalanobis distance.
The details of this strategy were proposed in Chapter 3, and the key points
are summarized in Subsection 4.2.1. We then calculate the geodesic distance
between this distribution and the uniform distribution, where a smaller dis-
tance indicates higher local traffic complexity. The local complexity index
takes into account this distance and the local intensity of the traffic.
To manage the computational intensity of evaluating the geodesic dis-
tance, we employ a practical approximation using the symmetrized Kullback-
Leibler (SKL) divergence. This allows for efficient estimation of local traffic
complexity across the grid, making the methodology feasible for real-time
applications.

4.2 Deriving a complexity index via informa-


tion geometry

Our objective, in this section is to obtain a local complexity index using


concepts from information theory. This construction consists of two parts.
In the first part, we construct an angular probability density at a given
point on the grid. The trajectories crossing the corresponding cell provide a
statistical sample of the angular variable, and we apply the principle of Max-
imum Entropy to determine an angular density consistent with the empirical
Fourier coefficients obtained. The constraints corresponding to the empirical
coefficients are relaxed based on the quality of the estimation, particularly
the number of samples (i.e., the number of trajectories crossing the cell). By
applying Fenchel duality, we can prove that the density maximizing entropy
belongs to the GvM density family. Since it is expressed via the maximizer
of the dual function, the solution actually appears in the Maksimov form.
The second part, which is the main focus of this chapter, involves eval-
uating the geodesic distance between the obtained distribution and the uni-
form angular distribution that corresponds to the less organized traffic hence
the most difficult one to control.

85
We will see in Subsection 4.3.1 that the natural parametrization of the
GvM distribution manifold suggested by Fenchel duality, which corresponds
to the Maksimov form, theoretically allows for the calculation of this geodesic
distance by solving a relatively high-dimensional system of ordinary differ-
ential equations. For reasons of algorithmic and numerical simplicity, we will
then use the classical approximation of the geodesic distance by the sym-
metrized Kullback-Leibler divergence.

4.2.1 Maximum Entropy angular densities


Consider a cell centered at a grid point in the airspace, an example of which
is given in Figure 4.1. The empirical Fourier coefficients
1 X 1 X
xl = cos(lθj ) and yl = sin(lθj ) (4.1)
πn j∈J πn j∈J

approximate the unknown Fourier coefficients of the searched angular distri-


bution p(θ). We may assume to have access to the quality of this approxi-
mation, under the form of the covariance matrix Σ of the vector
 
T(θ) = cos θ, sin θ, . . . , cos(N θ), sin(N θ) .

We define z = (x1 , y1 , . . . , xN , yN ). The choice of the highest frequency N has


been discussed in Chapter 3. In our Maximum Entropy approach, adequacy
of the moments of the searched density p(θ) is requested, which involves the
linear operator A : L1 (S) → R2N given by
Z
Ap = p(θ)T(θ) dθ,
S

where S := R/2πZ. The attach term in our Maximum Entropy problem


accounts for the covariance matrix Σ whenever an estimate of it is available.
It is defined as
 D E
 z − Ap, Σ† (z − Ap) if z − Ap ∈ ran Σ,
∥z − Ap∥2Σ† =
 ∞ otherwise,

in which Σ† is the Moore-Penrose generalized inverse of Σ and ran Σ denotes


the range of Σ. The infinite value forces the moment vector Ap to lie in

86
Figure 4.1: Example of a cell around a grid point in the airspace, with
trajectories across ATL on December 26, 2018.

the affine space {z} + ran Σ. If Σ is positive definite, then Σ† is merely the
inverse of Σ, and ∥z − Ap∥Σ† is the Mahalanobis distance between z and Ap.
The Maximum Entropy problem reads
α
Minimize H(p) + ∥z − Ap∥2Σ†
(P) Z 2
s.t. 1 = p(θ) dθ.
S

Here, the entropy functional H is defined on L1 (S) by

t ln t if t > 0,

Z 

  
H(p) = h p(θ) dθ with h(t) :=  0 if t = 0,
S  ∞ if t < 0

87
and α > 0 fixes the balance between the entropy and the attach term. It was
shown in [86] that a Fenchel duality approach enables to solve the infinite
dimensional constrained problem (P) via a finite dimensional unconstrained
smooth concave maximization problem, which moreover involves Σ rather
than Σ† . The solution to Problem (P) is given by
D E
exp λ̄, T(θ)
p̄(θ) = Z D E , (4.2)
exp λ̄, T(θ) dθ
S

in which λ̄ maximizes the function D̃ given by


1 Z
D̃(λ) := ⟨λ, z⟩ − ∥λ∥Σ − ln exp ⟨λ, T(θ)⟩ dθ.
2
2α S

Observe that:

• the dual function D̃ is everywhere finite, smooth and strongly concave,


so that it’s maximization is numerically rather easy;

• the solution p̄, which is strictly positive and smooth on S by construc-


tion, pertains to the family of GvM distribution of degree N .

The generalized von Mises distribution of degree N , hereafter denoted


by GvMN was studied in [49, 48]. Its canonical form is:

1 N
!
p(θ) = exp κk cos k(θ − µk ) , (4.3)
X
G0 k=1

where
N
Z !
G0 = exp κk cos k (θ − µk ) dθ (4.4)
X
S k=1

The GvMN distribution can be put in the Maksimov form [77]:


1
p(θ) = φ(θ) (4.5)
G0
where
N
!
φ(θ) = exp κk cos µk cos kθ + κk sin µk sin kθ (4.6)
X

k=1

88
which is the canonical form of an exponential distribution:
h i
p(θ) = exp ⟨λ, T (θ)⟩ − ln G0 (4.7)

with λ := (κ1 cos µ1 , κ1 sin µ1 , . . . , κN cos µN , κN sin µN ), so that λ2k−1 =


κk cos µk and λ2k = κk sin µk for k = 1, . . . , N . Finally, for a discussion
on the choices of the highest frequency N and the relaxation parameter α,
the reader is invited to refer to Chapter 3 or the companion paper [86].

4.2.2 The Fisher information metric


In this section, we recall some basic definitions and properties in information
geometry. The reader is referred to [6] for a comprehensive introduction.

Definitions and properties

Definition 8. A statistical model is a pair (M, p) where M is an oriented


N -dimensional smooth manifold and (pθ )θ∈M is a parameterized family of
probability densities on a measure space (Ω, T , µ) such that, putting p(θ, ω) =
pθ (ω):

• For µ-almost all ω ∈ Ω, the mapping θ 7→ p(θ, ω) is smooth.

• For any θ ∈ M, there exists an open neighborhood Uθ of θ and an


integrable mapping h : Ω → R+ such that, for any ξ ∈ Uθ , |∂θ p(ξ, ω)| ≤
h.

• The mapping θ → pθ ∈ L1 (Ω, µ) is one-to-one.

• The support of pθ does not depend on θ.

Definition 9. Assuming p never vanishes, one can define the score l : M ×


Ω → R as:
l(θ, ω) = log p(θ, ω) (4.8)
or in brief:
lθ = log pθ . (4.9)

89
Proposition 10. The score satisfies:

E [∂i lθ ]pθ = 0, i = 1 . . . N. (4.10)

in which E denotes the expectation operator.

A simple computation shows that:


∂i pθ ∂j pθ √ √
Z Z
E [∂i lθ ∂j lθ ] = √ √ dµ(ω) = 4 ∂i ( pθ ) ∂j ( pθ ) dµ(ω), i, j = 1 . . . N
Ω p θ pθ Ω
(4.11)
proving that:
√ √
gij = E [∂i lθ ∂j lθ ] = ⟨∂i ( pθ ) , ∂j ( pθ )⟩L2 (Ω,µ) (4.12)

Let g be the section of T M∗ ⊗ T M∗ defined by 1 :

g = gij dθi ⊗ dθj (4.13)

Now, given any tangent vector X = X i ∂i ∈ Tθ M:


√ √
g(θ; X, X) = gij X i X j = ⟨∂i ( pθ ) , ∂j ( pθ )⟩L2 (Ω,µ) (4.14)
√ √
= ⟨X i ∂i ( pθ ) , X j ∂j ( pθ )⟩L2 (Ω,µ) (4.15)
= ⟨Z, Z⟩L2 (Ω,µ) (4.16)
√ 
with Z = X i ∂i pθ . Given the assumptions made on the family pθ , g is a
thus a positive definite symmetric section of T M⊗T M, hence a Riemannian
metric on M called the Fisher Information Metric (FIM).

Remark 11. The mapping I : θ 7→ pθ embeds M as a submanifold of the
unit sphere in L2Ω,µ and the Fisher information metric is just the pullback of
the ambient metric in L2Ω,µ with respect to I.

The next proposition is well known in statistics, provided partial deriva-


tives and expectation commute.
Proposition 12. The Fisher information satisfies:

E [∂i lθ ∂j lθ ] = −E [∂i ∂j lθ ] . (4.17)


1
The summing convention on repeated indices is used through the document.

90
Proof. Taking the second derivative of the score yields:

∂i ∂j pθ ∂i pθ ∂j pθ
∂i ∂j lθ = − . (4.18)
pθ p2θ

Taking the expectation and using commutation assumption between deriva-


tives and expectations:
Z
∂i ∂j pθ
∂i ∂j E[pθ ] = 0 = E [∂i ∂j pθ ] = pθ dω. (4.19)
Ω pθ

Now, the claim follows from:


Z
∂i pθ ∂j pθ
E [∂i lθ ∂j lθ ] = dω. (4.20)
Ω pθ

The form of Fisher information given by Proposition 12 is particularly


suitable for distributions belonging to the exponential family. This fact will
be used in the next section.

Geodesic distances

We now turn to the computation of geodesic distances. Clearly, for a given


number of trajectories, the closer the optimal distribution p̄ to the uniform
distribution, the more complex the traffic. Note that the uniform distribution
on S can be regarded as the limit of GvMN distributions as the parameter λ
goes to zero. We therefore proceed to compute the geodesic distance between
GvMN distributions in Maksimov form. We recall without proof basic facts
that can be found in any textbook dealing with Riemannian geometry.

Proposition 13. Let (M, g) be a Riemannian manifold. There exists a


unique Koszul connection ∇ on T M , called the Levi-Civita connection, such
that, for any vector fields X, Y, Z:

• ∇X Y − ∇Y X = [X, Y ].

• Zg (X, Y ) = g (∇Z X, Y ) + g (X, ∇Z Y ) .

91
Proposition 14. Under the same assumptions as in Proposition 13, if

γ : [a, b] → M

is a smooth path, its length is defined as:


Z b
l(γ) = g 1/2 (γ̇(t), γ̇(t)) dt. (4.21)
a

Remark 15. Since the length is invariant by a change of parametrization,


one can assume that the domain of γ is the interval [0, 1].

The geodesic distance between two points p, q on M is defined as

inf {l(γ), γ(0) = p, γ(1) = q} .

A path γ realizing the infimum is called a minimizing geodesic.


Proposition 16. A minimizing geodesic γ satisfies the equation:

∇γ̇ γ̇ = 0. (4.22)

Definition 17. Let (e1 , . . . , eN ) be a basis of Tp M. Then:

∇ei ej = Γkij ek . (4.23)

The coefficients Γkij are called the Christoffel symbols of the connection ∇.
Proposition 18. In coordinates, Equation (4.22) becomes:

γ̈ k + Γkij γ̇ i γ̇ j = 0, k = 1 . . . N. (4.24)

This proposition gives a practical way of computing a geodesic by inte-


grating an ordinary differential equation. Canned routines solving boundary
value problems can be used for that purpose. In the sequel, an explicit form
for the Christoffel symbols is given.
Let pη belong to the exponential family and parameterized by some
vector η ∈ RN : h i
pη (x) = exp ⟨η, x⟩ − ψ(η) . (4.25)
Its Fisher information matrix is defined as:
h i
gi,j (η) = −E ∂η2i ηj lη , (4.26)

92
in which lη (x) := ln pη . Using Equation (4.25), it comes:

gi,j (η) = ∂η2i ηj ψ(η). (4.27)

In the case of the GvMN density, it is convenient to use Maksimov


parametrization as in Equation (4.5). Since ψ = ln G0 , the derivatives to be
considered are:

∂λ2i λj G0 ∂λi G0 · ∂λj G0


∂λ2i λj log G0 = − . (4.28)
G0 G20

Exchanging derivative and integral is valid, so it comes:


Z
∂λi G0 = Ti (θ) exp ⟨λ, T(θ)⟩ dθ (4.29)
S

and
Z
∂λ2i λj G0 = Ti (θ)Tj (θ) exp ⟨λ, T(θ)⟩ dθ, (4.30)
S

in which T2k−1 (θ) = cos kθ and T2k (θ) = sin kθ, k = 1, . . . , N .


Introducing the Fourier coefficients of φ, namely

1Z
ak (φ) = φ(θ) cos(kθ) dθ, l ∈ N,
π S

and
1Z
bk (φ) = φ(θ) sin(kθ) dθ, k ∈ N∗ .
π S

we see that G0 = πa0 (φ) and that the derivatives are given by


πak (φ) if i = 2k − 1
∂λi G0 (λ) =
k (φ) if i = 2k
πb

93
and

∂λ2i λj G0

cos(k + l)θ + cos(k − l)θ


Z
φ(θ) dθ if i = 2k − 1, j = 2l − 1,


2




 S
sin(k + l)θ − sin(k − l)θ

 Z

φ(θ) dθ if i = 2k − 1, j = 2l,



2


S
= Z
sin(k + l)θ + sin(k − l)θ
φ(θ) dθ if i = 2k, j = 2l − 1,


2





 S
cos(k + l)θ + cos(k − l)θ

 Z

φ(θ) dθ if i = 2k, j = 2l


2

S



 ak+l (φ) + ak−l (φ) if i = 2k − 1, j = 2l − 1,


bk+1 (φ) − bk−l (φ) if i = 2k − 1, j = 2l,


π

=
2
bk+l (φ) + bk−l (φ) if i = 2k, j = 2l − 1,




k+1 (φ) + ak−l (φ) if i = 2k, j = 2l.

−a

The Fisher information metric then reads:


∂λ2i λj G0 ∂λi G0 · ∂λj G0
gij = −
G0 G20

1 ak+l + ak−l ak al

− 2 if i = 2k − 1, j = 2l − 1


2




 a0 a0
1 bk+l − bk−l ak bl



if i = 2k − 1, j = 2l,

− 2


2


a0 a 0
= 
1 bk+l + bk−l bk al
− 2 if i = 2k, j = 2l − 1,


2

a0 a0





 1 −ak+l + ak−l − bk bl


if i = 2k, j = 2l


2

a0 a20

with k, l ∈ {1, . . . , N }, in which we omitted the dependence on φ when


writing the Fourier coefficients to simplify the notation.
The Christoffel coefficients needed in the geodesic equation can be com-

94
puted pretty much the same way using the well-known formula:
1
!
∂gli ∂glj ∂gij
Γpij = g pl + − (4.31)
2 ∂xj ∂xi ∂xl
where the convention of summation on repeated indices is taken and the
notation g pl stands for the (p, l) element of the inverse matrix g −1 . The third
partial derivative of ln G0 with respect to λi , λj , λp is readily obtained from
Equation (4.28):

∂λ3i λj λk G0 ∂λi G0 ∂λj G0 ∂λp G0


∂λ3i λj λp log G0 = +
G0 G30
∂λ2i λj G0 ∂λp G0 + ∂λ2p λi G0 ∂λj G0 + ∂λ2j λp G0 ∂λi G0
− .
G20

All terms in Equation (4.32) are already known except for the first one.
Starting with Equation (4.29), all third partial derivatives can be obtained
using the Fourier coefficients of φ as indicated below:
1
∂ 3 λ2i λ2j λ2k G0 = (ai+j+k + ai+j−k + ai−j+k + ai−j−k ) ,
8
1
∂ 3 λ2i λ2j λ2k+1 G0 = (bi+j+k − bi+j−k + bi−j+k − bi−j−k ) ,
8
1
∂ 3 λ2i λ2j+1 λ2k+1 G0 = (−ai+j+k + ai+j−k + ai−j+k − ai−j−k ) ,
8
1
∂ 3 λ2i+1 λ2j+1 λ2k+1 G0 = (−bi+j+k + bi+j−k + bi−j+k − bi−j−k ) .
8
The geodesic distance between two GvMN distributions p, q is the in-
fimum of the lengths of smooth paths joining p and q. It is the length of a
minimizing geodesic γ with endpoints p, q. Such a curve is the solution of a
differential equation with boundary conditions:

 ∇γ̇ γ̇ = 0
(4.32)
 γ(0) = p, γ(1) = q

where ∇ is the Levi-Civitaconnection. Expanding γ in λ coordinates, that


is γ(t) = γ 1 (t), . . . , γ 2N (t) , gives:

 γ̈ i (t) + Γijk γ̇ j (t)γ̇ k (t) = 0, i = 1 . . . 2N
(4.33)
 γ i (0) = pi , γ i (1) = q i , i = 1 . . . 2N.

95
with p = (p1 , . . . , p2N ), q = (q 1 , . . . , q 2N ) in λ coordinates. Finding the
geodetic distance between two GvMN distributions can be accomplished
numerically by a boundary value problem solver or a shooting algorithm. The
computational cost is quite high, but it turns out on some examples that a
pretty good approximation is given by the SKL divergence. It was therefore
decided to validate the concept using this value instead of the true one,
leaving the design of an efficient geodetic algorithm for future developments.

Remark 19. The asymptotic behavior of the GvMN distribution in the


limit of large κi , i = 1 . . . N arguments can be found using Laplace’s approx-
imation for integrals [14]. Let f ∈ C ∞ ([0, 2π]; R) then:

N
Z 2π !
f (θ) exp λ κi cos i (θ − µi ) dθ
X
0 i=1


s !
 
= (η)eλQ(η)
1 + −1
) (4.34)
X
f O(λ
η λ|Q′′ (η)|

where Q is the mapping:

N
Q(θ) = κi cos i (θ − µi ) (4.35)
X

i=1

and η ranges over the roots of Q′ such that Q′′ (η) < 0, assuming that all
roots are of multiplicity one. Introducing the complex polynomial P :

N
P (z) = − jκj e−ijµj z j (4.36)
X

j=1

the roots of Q′ (θ) are exactly the intersection points of the closed curve
γ : t ∈ [0, 1] → P (ei2πt ) with the real axis and there is exactly N of them
corresponding to maxima of Q.
Now, using Formula (4.34) again for the normalizing factor in Equa-
tion (4.3), it comes:

ξ |Q |
′′ −1/2
(ξ)f (ξ)
P
lim E [f ]λκ1 ,...,λκN ;µ1 ,...,µN = ′′ −1/2 (ξ)
(4.37)
ξ |Q |
P
λ→+∞

96
where ξ ranges over the η realizing the global maximum of Q. So, in the
sense of distributions:
|Q′′ |−1/2 (ξ)δξ
P
lim pλκ1 ,...,λκN ;µ1 ,...,µN = Pξ ′′ −1/2 (ξ)
(4.38)
λ→+∞ ξ |Q |

The extra degree of freedom gained from the introduction of λ calls for a
normalization condition. One possible choice is to set κN = 1.

4.3 Validation

4.3.1 Simulation
We conduct testing on a single cell to assess the fundamental functioning of
the model. The results from this step provide essential insights into accuracy
and basic performance.
We start by performing simulations. We shall:

(a) Generate an angular sample following a probability p◦ ;

(b) Compute the corresponding empirical Fourier coefficients;

(c) Compute the Maximum Entropy density that is compatible, in the


relaxed setting described above, with our empirical Fourier coefficients.

(d) Compute the GvMN distribution corresponding to optimal density.

(e) Compute the SKL divergence between GvMN distribution q(θ) and
uniform distribution u(θ) as follow:
Z 2π Z 2π
u(θ) q(θ)
SKL (q∥u) = u(θ) ln d(θ) + q(θ) ln dθ. (4.39)
0 q(θ) 0 u(θ)

This simulation will allow us to get an idea of the sample size that is
necessary for obtaining relevant information, merely by comparing the true
density with the one obtained through the maximization of entropy. And
then based on the obtained SKL divergence, we can estimate the complexity.

97
To better understand this relationship, Table 4.1 presents the SKL di-
vergence between u(θ) and the GvMN q(θ) with varying numbers of peaks
and varying peak widths β. The goal is to understand how these parameters
affect the SKL divergence and their implications for calculating complexity
maps in ATM.

Table 4.1: The simulation results

Peaks-β 0.01 0.1 0.2 0.3


1 14.93 10.82 9.44 8.76
2 12.92 9.42 8.14 7.45
3 11.96 8.78 7.54 6.95
4 10.65 8.54 7.46 6.79
5 10.33 7.82 6.87 6.49
10 8.80 6.84 6.06 5.75
20 7.31 5.98 5.14 5.06

The simulation results indicate that as the number of peaks of the


GvMN increases, the SKL divergence between the uniform distribution and
the GvMN decreases. Specifically, with β=0.01, the SKL divergence de-
creases dramatically from 14.93 with one peak to 7.31 with 20 peaks. This
trend is consistent across all values of β. This can be explained by the fact
that increasing the number of peaks makes the GvMN resemble the uniform
distribution more closely, as multiple peaks create a flatter distribution. In
the context of ATM, an increase in the number of peaks corresponds to an
increase in the number of aircraft headings. As the number of headings
increases, the distribution of aircraft becomes more uniform, reducing the
difference between the actual and uniform distributions, thereby increasing
air traffic complexity.
Furthermore, the results show that as the value of β increases, the SKL
divergence decreases. For instance, increasing the values of β from 0.01 to
0.3, the SKL divergence decreases from 14.93 to 8.76 with one peak and
from 7.31 to 5.06 with 20 peaks. A larger value of β causes the peaks of
the GvMN to become flatter, making the distribution less concentrated and
more similar to the uniform distribution (see Figure 4.2). In ATM, increasing
the peak width β corresponds to more dispersed aircraft headings.
Special cases, such as when the number of peaks is one and the value of β

98
Figure 4.2: The GvMN with different β values and peak numbers: Top to
bottom, β = 0.01 and β = 0.3; Left: 1 peak, Right: 20 peaks.

increases from 0.01 to 0.3, show a significant decrease in SKL divergence from
14.93 to 8.76. This indicates that even with a single peak, making the peak
flatter and more spread out (larger β) can increase the complexity. Similarly,
with 20 peaks, the SKL divergence is the lowest among all peak numbers
across all values of β, emphasizing that multiple peaks in the GvMN make
it closer to the uniform distribution, especially when β value is also large.
These results not only provide insights into how the parameters of the
GvMN affect its divergence from the uniform distribution but also lay the
foundation for calculating complexity maps in ATM. Understanding these
relationships may assist air traffic managers in predicting, planning, and
adjusting air traffic flows to ensure optimal safety and efficiency. Making
the distribution of headings more concentrated can help reduce air traffic
complexity, thereby improving safety and efficiency in air traffic coordination.

4.3.2 Implementation

In the simulation phase, we investigated the behavior of the SKL divergence


under varying distributions of flight headings (θ) within individual cells. Each
cell has different concentrations and spreads of flight headings. Specifically,
we tested the metric on a synthetic dataset where flight headings were dis-

99
tributed according to the number of peaks (concentration points) and their
widths.
Building upon the insights gained from simulations, in order to validate
our findings on a real dataset, we proposed a new complexity index, called
Composite Complexity Index, in short, CCI (see Equation (4.40)). It is
weighted by the SKL divergence and the number of flights.
Our analysis utilizes flight data collected from ATL [107]. The dataset
provides a snapshot of flight trajectories on a specific date, enabling a detailed
examination of spatial distributions and flight path characteristics within
the airspace. The heading information from each trajectory is particularly
leveraged in our computation of the complexity map, which is based on
the SKL divergence determined in our simulation phase. These distances
quantify the directional dissimilarities between flight paths, offering insights
into the spatial complexities of aircraft movements around ATL.

Composite Complexity Index (CCI)

Local complexity depends on the intensity of air traffic and on the angular
distribution of trajectories. Each individual quantity is, of course, insuffi-
cient. Intense traffic with a predominantly unidirectional distribution or a
spread distribution with low-intensity traffic is not very complex. Therefore,
we choose, as a complexity index, a criterion that increases with both the
spread of the distribution and the intensity of traffic, named CCI. Since the
spread of the distribution is measured using the SKL divergence, it is natural
to choose, as a complexity criterion, the ratio
n
CCI = , (4.40)
a + SKL (q∥u)
in which n is the number of flights through the cell under consideration and
SKL (q∥u) is the SKL divergence from q to u. The positive parameter a avoids
singularity of the formula in the case where q = u. It is chosen empirically,
in such a way that the contrast in the complexity map is satisfactory.

Dataset

Based on the dataset described in Subsection 2.5.1 of Chapter 2, we build


complexity maps of air traffic by focusing on records of one day of air traffic at

100
ATL airport on December 26, 2018, which includes 215 flights. The detailed
attributes of each trajectory are outlined in Table 2.1 and Table 2.2.
Figure 4.3 illustrates the one-day traffic density of trajectories across
ATL on December 26, 2018. The highest number of flights occurs around
13:00 during these times. In contrast, the lowest flight density is observed
at around 06:00 to 08:00. There are smaller peaks observed in the afternoon
and evening, notably around 18:00 and 21:00. Overall, there is a noticeable
decline in flight activity from midnight to early morning, followed by a sharp
increase starting at 11:00, peaking in the mid-afternoon, and then stabilizing
at a relatively high level until the late evening.

Figure 4.3: Traffic density of trajectories across ATL on December 26, 2018.

All flight trajectories on this day can be seen in Figure 4.4. From the
figure, we can observe several prominent, thick lines suggesting popular flight
corridors, particularly in the northeast-southwest and northwest-southeast
directions.

101
Figure 4.4: All flight trajectories across ATL on December 26, 2018.

Experimental Results

In order to build a complexity map, we determine the radius of the cell,


and then we count the number of flights entering in each cell as well as the
heading values. After obtaining the values of the SKL divergence (following
the steps illustrated in Subsection 4.3.1). Then we employ a color map where
blue denotes the lowest complexity indices and red the highest ones. This
gradient allows for clear visualization of spatial complexity variations, aiding
in immediate interpretation of relative intensity levels across the mapped
area.
To analyze the impact of the number of flights and the SKL divergence
on the CCI of the airspace, we performed experiments on a dataset from
ATL airport described in Subsection 4.3.2. The cell radius was chosen equal
to 10 kms. We obtained the complexity maps in Figure 4.5.
Figure 4.5b presents a complexity map based on the traffic intensity.
Row 12 and columns from 2 to 18 show very high complexity: indeed a large
number of flights traverse these areas. The regions with the lowest complexity

102
(a) Trajectories (b) Based on traffic intensity

(c) Based on SKL divergence (d) Based on CCI

Figure 4.5: Complexity maps on ATL airport dataset.

103
are rows from 0 to 6 and columns from 6 to 20 having fewer flights.
Figure 4.5c displays a complexity map based on SKL divergence. The
rows from 5 to 10 and columns from 7 to 15 show high complexity. Conversely,
rows from 0 to 4 and columns from 0 to 6 exhibit lower complexity. This
suggests that the central region experiences significant variability in distances
between points, while the northwestern region has lower variability.
Figure 4.5d is a composite of the previous two factors, providing a more
comprehensive complexity map. The cells at row 12 and columns from 2 to 18
stand out again, indicating high complexity where both SKL divergence and
the number of flights are high. The least complex regions are rows from 0 to 6
and columns from 6 to 20, showing that when both factors are low, the overall
complexity decreases. This combination offers a thorough understanding of
complexity distribution, aiding in identifying areas that require attention in
ATM.
Overall, the complexity map clearly indicates that the point of highest
complexity is at the intersection of the main flight paths. This position
represents the densest air traffic and highest workload for ATCOs. The points
with the lowest complexity correspond to minimal workload for controllers.
For the detailed information, we compare the distance between the GvM
distribution (blue line) and the uniform distribution (green dashed line) in
three scenarios, including high, medium, and low complexities.
While a cell with index (12,1) has high complexity with 103 flights cross
this cell, the low complexity index is found in cell (7,5) having only one
aircraft flying over. The obtained GvM distributions in 2 cells are illustrated
in Figure 4.6 and Figure 4.7, respectively.
Figure 4.6 shows three predominant headings (near π, 3π 2
, and 2π). This
distribution deviates significantly from the uniform distribution.
The cell (16,16) exhibits a moderate complexity index (see Figure 4.8),
the GvM distribution has a significant deviation from the uniform distribu-
tion, indicating an average level of complexity in this region. This is illus-
trated by the mixture of various headings (concentrating on two directions)
in the inset diagram, which suggests a moderate level of traffic diversity.
An interesting observation is that although cell (15,7) has only 5 flights
concentrated in the three directions close to π6 , π, and 2π, which increases

104
Figure 4.6: High complexity cell: many flights, diverse directions.

Figure 4.7: Low complexity cell: few flights, few directions.

105
Figure 4.8: Medium complexity cell: moderate flights, moderate directions.

the complexity index, resulting in a high complexity level for this cell (see
Figure 4.9).

Figure 4.9: High complexity cell: fewer flights, diverse directions.

106
4.4 Conclusions
We have developed a methodology for assessing the local complexity of air
traffic. This strategy is decomposed in three steps: we first estimate, at
each point of a Cartesian grid, the probability density of aircraft headings
in corresponding cell, using the principle of Maximum Entropy (this was
detailed in Chapter 3); then we compute an approximation of the geodesic
distance to the uniform distribution, considered for obvious reasons as the
distribution of maximum complexity; finally, we define and compute an index
of complexity that combines both the angular distribution and the intensity
the local traffic. This index of complexity is evaluated at each point of our
grid, yielding a complexity map.
The chosen formula for the index of complexity is one choice among
others. However, our construction on real data seems to give what one may
expect. Other criteria may be obtained by considering the discrepancy be-
tween the heading distributions at entry and exit points of each cell. At all
events, it appears that our strategy for building complexity maps is both nu-
merically tractable and efficient. The problem of efficient geodesic distance
computation will be addressed in a future work.

107
108
Chapter 5

Maximum Entropy multivariate


density estimation
This chapter explores the computation of multivariate probability distribu-
tions using the Maximum Entropy principle. The statistics associated with
a family of feature functions provide linear constraints on the unknown den-
sity. Discretization of the sought distribution is avoided, and the problem
then falls under partially finite convex programming. The solution is ex-
pressed through a primal-dual relationship, where the dual variable is ob-
tained by maximizing a concave function of a finite number of variables,
whose properties depend on the choice of the relaxation function. Various
types of constraint relaxations are considered, and in all cases, solving the
dual problem can be achieved using now-classical optimization algorithms.
The proposed formalism could, incidentally, optimize entropies other than
the Kullback-Leibler one. It extends previous results in which the sampling
space was assumed to be finite.

5.1 Introduction
In a number of applications, as in the modeling of animal or vegetal species
distributions or in the evaluation of air traffic density, one is facing the prob-
lem of estimating a multivariate probability distribution from observed data
in the form of samples, which are considered as being drawn independently
according to the unknown distribution. In this chapter, we develop a non-
parametric approach based on a combination of Maximum Entropy and the
method of moments.
In [42], such a problem was considered for discrete finite sample spaces,

109
and tackled by means of the Maximum Entropy principle leading to a finite
dimensional optimization problem under linear moment constraints. Each
moment constraint is obtained via some statistic, that is, by estimating the
expectation of some feature function. Such a problem is tackled by means
of Fenchel duality, which provides a flexible framework for introducing the
relaxation of the above mentioned constraints. A potential function encodes
the type of relaxation (named regularization in the above reference). While
the equality constraint (no relaxation at all) is encoded by means of the
indicator function of a singleton, box constraints are also considered, as well
as least-square relaxation.
The Fenchel duality formalism provides a machinery that contains all
cases. Our main purpose in this chapter is to show that the nice framework
proposed in [42] can be extented to the more general setting where the sample
space is no longer discrete. We therefore consider the optimization of the so-
called differential entropy or of its relative counterpart, the Kullback-Leibler
divergence between probability measures.
Such an extended framework relies on extensions of the Fenchel duality
theorem, as well as on Rockafellar’s theory of convex integral functionals.

5.2 Statement of the problem

Let Ω ⊂ Rd be a bounded domain. We assume throughout that Ω is endowed


with the σ-algebra of its Borel sets. The space of Borelian functions on Ω is
denoted by Bor(Ω).
Points x1 , . . . , xn are observed in Ω. These points are assumed to be
drawn according to some unknown probability P on Ω, and the fundamental
problem is that of recovering P . Feature functions f1 , . . . , fm : Ω → R are
chosen, from which statistics are derived:
1X n
⟨fk ⟩ = fk (xj ), k = 1, . . . , m. (5.1)
n j=1

Recall that the above equation expresses the expectation of fk under the
empirical measure
1X n
Pemp = δx ,
n j=1 j

110
in which δx denotes the Dirac measure at x. The Maximum Entropy principle
states, in its native form, that among all probability measures P that are
consistent with the moments constraints
Z
⟨fk ⟩ = fk dP, k = 1, . . . , m, (5.2)

one should select the probability P̄ which minimizes the Kullback-Leibler


relative entropy K (P ∥P◦ ) of P with respect to P◦ . Here, P◦ is a reference
measure, supported in Ω, which encodes the prior knowledge one may have
about P . If no prior knowledge is available, it is customary to choose P◦
to be the uniform measure on Ω. Recall that the Kullback-Leibler relative
entropy of P with respect to P◦ is defined as

dP
 Z

ln dP if P ≺≺ P◦ ,
dP◦

K (P ∥P◦ ) :=
otherwise.


Here, dP/ dP◦ denotes the Radon-Nikodym derivative of P with respect


to P◦ , and the notation P ≺≺ P◦ means that P is absolutely continuous with
respect to P◦ . We are facing the following optimization problem:

Minimize K (P ∥P◦ )
(P◦ ) Z Z
s.t. 1 = dP, y = f dP.
Ω Ω

The shorthand s.t. stands for subject to, while y = (y1 , . . . , ym ) ∈ Rm is


the vector whose k-th component is equal to ⟨fk ⟩ and f is the Rm -valued
function whose k-th component is equal to fk .
Problem (P◦ ) is an infinite dimensional convex problem with finitely
many linear constraints. Although the optimized variable P lies in the space
of Radon measures on Ω, it is equivalent to search for its density with re-
spect to P◦ . As a matter of fact, minimizing the Kullback-Leibler divergence
discards measures that are not absolutely continuous with respect to P◦ .
Moreover, since the statistics in Equation (5.1) are inaccurate values of
the moments under consideration, the constraints in Equation (5.2) must be
relaxed. Following [42] we introduce, for the purpose of relaxation, a convex
potential function U : Rm → (−∞, ∞].

111
We therefore obtain the relaxed, functional counterpart of Problem (P◦ ),
namely
Minimize H(p) + U (Fp)
(P)
s.t. 1 = Ip,
in which the variable p is the Radon-Nikodym derivative of the searched
measure P with respect to P◦ . The neg-entropy functional H is defined by
t ln t if t > 0,

Z   

H(p) := h◦ p(x) dP◦ (x) with h◦ (t) := 0 if t = 0,
∞ if t < 0

and the linear operators F and I are respectively defined by


Z Z
Fp = f (x) p(x) dP◦ (x) and Ip = p(x) dP◦ (x).
Ω Ω

The potential function U is generally a convex non-negative function which


attains its minimum value at y. The case of non-relaxed constraint [42]
corresponds to the choice Uy (s) = Uy(0) (s) = δ{y} (s). Recall that the indicator
function of a set S is the function
0 if s ∈ S,
(
δS (s) :=
∞ otherwise.

Clearly, this choice enforces the equality Fp = y. In [42], the case of box
constraints was also considered. It corresponds to the choice
m
Uy (s) = Uy(1) (s) = δBy (s) in which By := [yk − βk , yk + βk ]
Y

k=1

with β = (β1 , . . . , βm ) ∈ (R∗+ ). Since the convex conjugate of δBy (·) fails to
be differentiable, one may consider an approximation of the box constraint
that yields a differentiable conjugate as in [42, Section 5.2]. This will also be
considered in the present chapter. The least square relaxation corresponds
to the choice
1
Uy (s) = Uy(2) (s) = ∥y − s∥2 ,

in which ∥ · ∥ denotes the Euclidian norm and α is some positive constant.
We may also consider, more generally, the choice
1 2
U (s) = y−s
2α Q

112
in which ∥z∥2Q = ⟨z, z⟩Q = ⟨z, Qz⟩ with Q a positive definite symmetric
matrix. An interesting example of this is obtained if we let Q = Σ−1 with Σ
an estimate of the covariance matrix of the vector y: the relaxation then
accounts for the variance of each principal component of the data vector
regarded as a random vector.
Implicitly, the above problem is stated in the space of integrable func-
tions having well-defined integrals against all feature functions. In general,
such a space is a subspace of L1P◦ (Ω), the space of measurable functions on Ω
that are integrable with respect to P◦ . We note that, in the case where all
feature functions are bounded, our natural workspace is L1P◦ (Ω) itself. In
general, however, the natural workspace for Problem (P) is a proper sub-
space of L1P◦ (Ω). The reason for this is that the linear mapping F is well
defined on the space
n o
p ∈ L1P◦ (Ω) pfk ∈ L1P◦ (Ω), k = 1, . . . , m .
Generally speaking, the implicit workspace involved in any moment problem
that involves feature functions taken in a set Φ is the Köthe space
n o
LP◦ (Ω; Φ) := p ∈ Bor(Ω) ∀f ∈ Φ, pf ∈ L1P◦ (Ω) . (5.3)

In the next section, we shall explore the dualization of problem (P)


and show how this dualization provides algorithms for effective computation
of Maximum Entropy solutions in our infinite dimensional setting, without
having to discretize the statistical problem under consideration.

5.3 Maximum Entropy solutions


We now consider the dual approach for solving Probem (P). The cornerstone
of the analysis is an extension of Fenchel’s duality theorem for which the
primal variable lies in an infinite dimensional space. For convenience, we
define on R × Rm the function
G(η◦ , η) := −δ{1} (η◦ ) − U (η)
and rewrite (P) is the equivalent form
(P) Minimize H(p) − G(Ap),
in which Ap := (Ip, Fp).

113
Theorem 20 (Fenchel duality). Suppose we are given L and L⋆ two vector
spaces, algebraically paired by the bilinear mapping ⟨·, ·⟩; A : L → Rd , a
linear mapping and A⋆ : Rd → L⋆ , its adjoint; H : L → (−∞, ∞], a proper
convex functional and H ⋆ : L⋆ → (−∞, ∞], its convex conjugate; G : Rd →
[−∞, ∞), a proper concave function and G⋆ : Rd → [−∞, ∞), its concave
conjugate. If the constraint qualification condition
(QC) ri(A dom H) ∩ ri(dom G) ̸= ∅
is satisfied, then
n o n o
inf H(p) − G(Ap) = maxd G⋆ (λ) − H ⋆ (A⋆ λ) .
p∈L λ∈R

The notation ri S stands for the relative interior of the set S, that is, the
interior of S in the induced topology of its affine hull. The effective domain
dom H of a convex function H is the set of points p for which H(p) < ∞,
and the effective domain dom G of a concave function is the set of points η
for which G(η) > −∞. The function G⋆ is the concave conjugate of G,
while H ⋆ denotes the convex conjugate of H. The reader unfamiliar with
convex analysis is invited to refer to [100, 55, 18].
Theorem 20 establishes the equality between the optimal value in the
primal problem (the minimization of the primal function H −G◦A) and that
of the dual problem (the maximization of the dual function G⋆ − H ⋆ ◦ A⋆ ). It
also says that the dual problem is attained, meaning that the dual problem
has at least one solution.
In order to tackle our generic Maximum Entropy problem, it remains to:

1. find condition for existence of a primal solution;


2. compute the conjugate of H, a highly non-trivial thing when H is an
integral functional;
3. link the possible primal solution to a dual solution.
Theorem 21 (Primal attainment). Together with the notation and assump-
tions of Theorem 20, suppose that
(QC ⋆ ) ri dom G⋆ ∩ ri dom(H ⋆ ◦ A⋆ ) ̸= ∅.
Suppose in addition that

114
(a) H ⋆⋆ = H and G⋆⋆ = G;

(b) there exists λ̄ maximizing the dual function and p̄ ∈ ∂H ⋆ (A⋆ λ̄) such
that H ⋆ ◦ A⋆ has gradient Ap̄ at λ̄.

Then p̄ minimizes the primal function.

The notation ∂F (x) stands for the subdifferential of the function F at x.


The condition G⋆⋆ = G is satisfied whenever G is an upper-continuous proper
concave function, a mild assumption satisfied by all examples of interest. The
condition H ⋆⋆ = H is more difficult to verify when H is an integral functional
defined on a functional space such as LP◦ (Ω; Φ).
We owe to Rockafellar a series of works [103, 101, 102] allowing us to
compute the convex conjugate of an integral functional in a rather simple
way. Under certain conditions (concerning the integrand h of H but also the
functional workspace) it is possible to conjugate through integral sign, that is
to say, to obtain the conjugate of the integral as the integral with conjugate
integrand.
An integral functional on Ω is a functional of the form
Z
H(p) = h(p(x), x) dP◦ (x), p ∈ L.

Here, we endow Ω with a σ-algebra and with a positive σ-finite measure P◦ ,


and L is a space of measurable functions.
We note that the dependence of h in its second argument could be
ignored here, since we focus on the case where h(p(x), x) = h◦ (p(x)).
A space L of measurable functions is decomposable if it is stable under
bounded alteration on sets of finite measure, that is to say, if for every p ∈ L,
every set T such that P◦ (T ) is finite and every measurable function q that is
bounded on T , the function

p(x) if x ∈ Ω \ T,
(
p̃(x) :=
q(x) if x ∈ T

belongs to L. In the context of this chapter,


n we are mostly
o interested in the
vector space L = LP◦ (Ω; Φ) with Φ = 1Ω , f1 , . . . , fm , where 1Ω denotes
the function identically equal to 1 on Ω.

115
Theorem 22 (Conjugacy through the integral sign). Let L and L⋆ be spaces
of measurable functions on Ω paired by means of the integral bilinear form
Z
⟨p, ϕ⟩ = p(x)ϕ(x) dP◦ (x).

Let h : R × Ω be a measurable integrand that is proper convex and lower semi-


continuous in its first argument, and let H be the functional of integrand h:
Z  
H(p) = h p(x), x dP◦ (x), p ∈ L.

Assume that L is decomposable and that H has nonempty effective domain.


Then Z
H ⋆ (ϕ) = h⋆ (ϕ(x), x) dP◦ (x)
 ⋆
for every ϕ ∈ L⋆ , in which h⋆ (·, x) := h(·, x) . Moreover, H ⋆ is convex
on L⋆ .

In our context, L and L⋆ are respectively the Köthe space LP◦ (Ω; Φ), as
defined in Equation (5.3), and its dual Köthe space
n o
L⋆P◦ (Ω; Φ) := φ ∈ Bor(Ω) ∀p ∈ LP◦ (Ω; Φ), pφ ∈ L1P◦ (Ω) ,
n o
with Φ = 1Ω , f1 , . . . , fm . It was shown that if a family Φ of feature func-
tions is such that every f ∈ Φ is integrable over every set of finite measure,
then LP◦ (Ω; Φ) is decomposable, an moreover that if 1Ω ∈ Φ, then L⋆P◦ (Ω; Φ)
is also decomposable. See Lemma 1 and Proposition 1 in [78].
The entropy kernel h(t, x) ≡ h◦ (t) is clearly a measurable integrand that
is proper convex and lower semi-continuous in its first argument. Its convex
conjugate is given by

h⋆◦ (τ ) = exp(τ − 1), τ ∈ R.

Since our pair of functional spaces LP◦ (Ω; Φ), L⋆P◦ (Ω; Φ) satisfies the require-
ments of Theorem 22, it comes that, for every φ ∈ L⋆P◦ (Ω; Φ),
Z  
H ⋆ (φ) = exp φ(x) − 1 dP◦ (x).

116
Now, h⋆⋆◦ = h◦ since h◦ is proper convex and lower semi-continuous. More-
over, it is readily seen that
n o
P◦ (Ω; Φ) := p ∈ Bor(Ω) ∀φ ∈ LP◦ (Ω; Φ), pφ ∈ LP◦ (Ω) = LP◦ (Ω; Φ).
1
L⋆⋆ ⋆

See Property (f) page 207 in [78]. It follows that conjugacy through the
integral sign applies one more time: for all p ∈ LP◦ (Ω; Φ),
Z  
H ⋆⋆ (p) = ◦ p(x) dP◦ (x) = H(p).
h⋆⋆

Theorem 23 (Primal-dual relationship). Together with the notation and as-


sumptions of Theorem 20, assume that H is an integral functional satisfying
the assumptions of Theorem 22. Assume further that G is proper concave and
upper semi-continuous (so that it equals its concave bi-conjugate) and that the
effective domain of the dual function D := G⋆ − H ⋆ ◦ A⋆ has nonempty inte-
rior. Assume finally that the conjugate integrand h⋆ is differentiable over R,
and that there exists some dual-optimal vector λ̄ in int dom D. If
 
p̄(x) := h⋆ ′ [A⋆ λ̄](x), x ∈ L,

then p̄ is a primal solution.

Let us now get back to our Maximum Entropy problem. Most potential
functions U are of the form U (η) = V (y − η) with V a proper convex lower
semi-continuous (and even) function. For example, the standard least-square
relaxation corresponds to the case where
1 2
V (η ′ ) = η′ , η ′ ∈ Rm .

An easy computation shows that, in this case,

G⋆ (λ◦ , λ) = λ◦ + ⟨y, λ⟩ − V ⋆ (λ).

The dual problem to Problem (P) then consists in maximizing the function
Z
D(λ◦ , λ) := λ◦ + ⟨y, λ⟩ − V (λ) − exp(λ◦ − 1)

exp ⟨λ, f (x)⟩ dP◦ (x), (5.4)

in which we account for conjugacy through the integral sign (Theorem 22).
We note that the effective domain of D coincides with that of V ⋆ whenever
the feature function f has bounded components.

117
Provided the qualification of constraints (QC) and its dual counterpart
(QC ) are satisfied, Theorem 23 tells us that the solution to the Problem (P)

is given by
h D Ei D E
p̄(x) = exp λ̄◦ − 1 + λ̄, f (x) = exp(λ̄◦ − 1) · exp λ̄, f (x) ,

in which (λ̄◦ , λ̄) maximizes the dual function in Equation (5.4). Applying
Fermat’s principle, we find that (λ̄◦ , λ̄) must satisfy the condition

(0, 0) ∈ ∂D(λ̄◦ , λ̄).

Here, the subdifferential is taken is the sense of concavity. It is readily


checked that the latter condition is equivalent to
Z D E −1
exp(λ̄◦ − 1) = exp λ̄, f (x) dx and 0 ∈ ∂ D̃(λ̄),

in which Z
D̃(λ) := ⟨λ, y⟩ − V ⋆ (λ) − ln exp ⟨λ, f (x)⟩ dx.

Therefore, maximizing the dual function in Equation (5.4) boils down to


maximizing the function D̃. We observe that D̃ is concave a upper semi-
continuous, as a consequence of the convexity and lower semi-continuity of
the log-Laplace transform. We note that evaluating D̃(λ) relies on some
numerical integration in dimension m. This is by no means a limitation, due
to the existence of powerful numerical integration algorithms such as Monte
Carlo integration, for example.
Let us summarize the above achievements. Provided one can check Con-
ditions (QC) and (QC ⋆ ), which depend on the choice of the potential V and
on that of the feature functions f1 , . . . , fm , we compute the optimal vector λ̄
by maximizing D̃, and we then obtain the Maximum Entropy solution p̄ via
the formula D E
exp λ̄, f (x)
p̄(x) = Z D E , x ∈ Ω.
exp λ̄, f (x) dx

Some potentials V give rise to nice differentiable concave unconstrained


maximization problems. This happens whenever the conjugate V ⋆ has full
domain and is differentiable. A standard example is provided by the least
square relaxation. However, in general, the function V ⋆ may suffer from

118
drawbacks such as failing to have full domain (so that the dual problem is
constrained) or being non-differentiable. A standard example is provided by
the case of box constraints.
In the smooth unconstrained case, quasi-Newtonian methods such as the
BFGS algorithm are expected to provide fast and accurate solution. If V ⋆
is not differentiable, then the situation is not hopeless at all, thanks to the
proximal gradient algorithm. As a matter of fact, in the latter case, the
function to be maximized is then a composite concave function, that is to
say, a sum of two concave functions one being differentiable, the other being
non-differentiable. The reader interested in the optimization of composite
functions and the proximal gradient algorithm may refer to [12, 115] and the
references therein.

5.4 Relaxation

Following [42], we explore a number of potential functions to be used for the


reconstruction of multivariate distributions. We assume throughout that the
domain of H ⋆ ◦ A⋆ is nonempty, a mild assumption on the family of feature
functions.

No relaxation at all. In this case V (η) = δ{0} (η). The convex conju-
gate V ⋆ (λ) is then the function identically equal to zero. Condition (QC)
reads y ∈ ri(F dom H) is this case, and Condition (QC ⋆ ) is automatically
satisfied. The reason is that the effective domain of G⋆ then is the whole
of Rm , and that ri dom(H ⋆ ◦ A⋆ ) is nonempty, as the relative interior of a
nonempty convex set.

Box constraints. The function V is defined in this case by


n
V (η) = [−βk ,βk ] (η) = δ[−βk ,βk ] (ηk ).
X
δQn
k=1
k=1

Its convex conjugate is then given by


n
V ⋆ (λ) = βk |λk | = β ⊙ λ
X
,
1
k=1

119
in which ⊙ denotes the pointwise product. In this case, Condition (QC)
reads
m
!
(yk − βk , yk + βk ) ri(F dom H) ̸= ∅.
Y \

k=1

Condition (QC ⋆ ) is, again, automatically satisfied. Since the corresponding


function D̃ is a composite function, the maximization of D̃ will be efficiently
performed by the proximal gradient algorithm.

Smoothed inner approximation of box constraints. Let ψβ : R →


(−∞, ∞] be defined by

η β+η
s
η2
ln + β ln 1− if η ∈ (−β, β),



ψβ (η) = 2 β − η β2

∞ otherwise.

The smooth inner approximation of box constraints with the same box as in
the previous paragraph, and its convex conjugate, are respectively given by
m m
!
λk
V (η) = εk ψβk (ηk ) and V (λ) =⋆
βk εk ln ch
X X
,
k=1 k=1 εk

in which εk controls the proximity of V (η) to the target box constraint po-
tential and, thereby, the proximity of the corresponding conjugate functions.
Figure 5.1 gives an idea of the approximation of the box constraint potential
and its conjugate. We note that the constraint qualification condition and
its dual counterpart are exactly as in the box constraint case. The advan-
tage here is that the dual function is differentiable, allowing the use of some
powerful quasi-Newtonian optimization method, such as BFGS.

Least squares. Given a symmetric positive definite matrix Q, the weighted


least square relaxation corresponds to the case where
1 2 α 2
V (η) = η and V ⋆ (λ) = η .
2α Q−1 2 Q

The standard least square relaxation is obtained by letting Q = I, the iden-


tity matrix. If the covariance matrix Σ of the data vector, regarded as a

120
Figure 5.1: Plots of εψβ (η) and its conjugate for β = 1 and various values
of ε.

random vector, is available or estimated, we may choose Q = Σ. The cor-


responding relaxation then entails the square Mahalanobis distance between
the data y and the moment vector Fp. It is remarkable that, one more time,
things look nicer on the dual side: no matrix inversion is required to evalu-
ate D̃(λ). Finally, the constraint qualification and its dual counterpart are
automatically fulfilled here.

5.5 Conclusion
In this chapter, we studied the Maximum Entropy approach to the problem of
reconstructing a spatial density from a finite statistical sample. The moment
constraints associated with the choice of feature functions are relaxed using
potential functions, as in [42].
The dualization of the problem allows the calculation of the solution den-
sity through the maximization of a dual problem in finite dimension, whose
properties depend on the chosen potential function. Partially finite convex
programming, as well as conjugacy of convex integral functionals, constitute
the major ingredients of the proposed developments. In all cases, the dual
problem is tractable, and the primal-dual relationship allows obtaining the
solution.
The tools developed in this chapter will make it possible to solve the
problem of reconstructing spatial densities without resorting to discretiza-
tion of the problem. This is a definite advantage, since the choice of a

121
discretization introduces a certain bias, which is contrary to the spirit of the
Maximum Entropy method.
The choice of a family of feature functions remains a crucial question in
this field. The authors of this chapter are currently focusing on this issue,
which will be the subject of a subsequent publication.

122
Chapter 6

Conclusion and further work


6.1 Conclusion

In the first part of this thesis (Chapter 2), we concentrated on applying Ba-
yesian inference to improve flight trajectory predictions: Bayesian inference
provides a powerful framework for incorporating prior knowledge and updat-
ing predictions based on new information. This allows for a more nuanced
and probabilistic approach to trajectory prediction. By leveraging Baye-
sian methods, we can better estimate the uncertainties inherent in trajectory
predictions and develop more effective strategies for managing air traffic in
increasingly congested airspace. Our BayesCoordLSTM model, which inte-
grates Bayesian optimization and coordinate transformation into a hybrid
ConvLSTM framework, has demonstrated superior performance in predict-
ing flight paths with higher accuracy and efficiency compared to existing
methods.
In the second part (Chapters 3 to 5), we focused on analyzing air traffic
complexity using concepts derived from information theory. We developed
a method to construct air traffic complexity maps in two parts: first, at
each point in the airspace, we estimate the angular probability density of the
headings of aircraft entering a circular cell centered at the considered point,
and then we evaluate the local complexity.
The first part, which is the subject of the article presented in Chapter 3,
uses the Maximum Entropy principle combined with the method of moments.
Fourier coefficients are estimated by empirical averages from trajectory data
over a fixed time window, and an angular density maximizing the entropy
while respecting the estimated moments is obtained via partially finite convex
programming. The result is a GvM (Generalized von Mises) density.

123
The second part involves calculating a complexity index associated with
the obtained GvM density and the intensity of the flow traversing the cell.
The complexity associated with a given GvM is defined as a decreasing func-
tion of its geodesic distance to the uniform distribution. Calculating this
distance is theoretically possible but quite complex, so we approximate the
geodesic distance by the SKL (Symmetrized Kullback-Leibler) divergence. A
complexity map can then be constructed by multiplying the previous index
by the local traffic density.
Currently, the local traffic density is simply estimated by counting the
number of aircraft crossing the considered cell. However, we questioned the
notion of traffic density. A more instantaneous notion than the one used in
the approach described here could make sense in air traffic management. This
motivated the writing of the article presented in Chapter 5. In this article,
we consider a spatial distribution of points (which could be the positions
of aircraft at a given moment) as resulting from a random draw according
to an unknown law. We then propose an estimation of this law using the
Maximum Entropy principle combined with the method of moments. Again,
partially finite convex programming forms the theoretical core of the study.

6.2 Further work


Building on the foundations laid by this research, several avenues for future
work can be pursued to further enhance the capabilities and applications of
our methodologies:
Take into account weather conditions for flight trajectory prediction: Fu-
ture research should focus on developing techniques for dynamic adaptation
of hyperparameters in the BayesCoordLSTM model. This could involve real-
time adjustments based on changing flight scenarios and weather conditions,
improving the model’s adaptability and real-time predictive accuracy.
Consider discrepancies between heading distributions at entry and exit
points of cells: Exploring different criteria for defining and measuring air
traffic complexity could enhance our understanding and improve the accu-
racy of complexity maps. By comparing aircraft density at the entry and exit
points of a region (cell S), we can identify if aircraft are changing their paths
within that area. A significant difference in these densities suggests alter-

124
ations in trajectories, indicating unpredictability and increasing the overall
complexity of the air traffic system.
Efficient geodesic distance computation: Future work could investigate
on optimizing algorithms for faster geodesic distance computation, incorpo-
rating real-time data for dynamic updates, and leveraging parallel processing
to handle larger datasets. Exploring these areas may enhance both accuracy
and efficiency in practical applications.
This research has established a robust framework for predicting flight
trajectories and assessing air traffic complexity. As air traffic volumes in-
crease, the deployment of advanced analytical tools such as those developed
in this study will become increasingly critical. By continuing to refine and
expand upon our methodologies, we can contribute to the ongoing efforts
to enhance the safety, efficiency, and effectiveness of air traffic management
systems worldwide.

125
126
References
[1] Abbasimehr, Hossein and Paki, Reza. “Improving time series forecasting
using LSTM and attention models”. In: Journal of Ambient Intelligence and
Humanized Computing 13.1 (2022), pp. 673–691. issn: 1868-5137, 1868-5145.
[2] Abdar, Moloud et al. “A review of uncertainty quantification in deep learn-
ing: Techniques, applications and challenges”. In: Information fusion 76
(2021), pp. 243–297.
[3] Adrían, García and Daniel, Delahaye and Manuel, Soler. “Air traffic com-
plexity map based on linear dynamical systems”. In: hal-02512103 (2019),
pp. 1–35.
[4] Ahmed, Mohammed A and El-Sheimy, Naser and El-Bakry, Hosam and
Moussa, Ahmed. “Machine learning-based prediction of air traffic complex-
ity and delay”. In: Journal of Air Transport Management 77 (2019), pp. 97–
109.
[5] Alharbi, Abdulrahman and Petrunin, Ivan and Panagiotakopoulos, Dim-
itrios. “Deep learning architecture for UAV traffic-density prediction”. In:
Drones 7 (2023), pp. 1–31.
[6] Amari, Shun-ichi. Information geometry and Its applications. Applied Math-
ematical Sciences. Springer Japan, 2016. isbn: 9784431559788.
[7] Amari, Shun-ichi and Karakida, Ryo and Oizumi, Masafumi. “Information
geometry connecting Wasserstein distance and Kullback–Leibler divergence
via the entropy-relaxed transportation problem”. In: Information Geometry
1 (2018), pp. 13–37.
[8] Amersfoort, Joost and Smith, Lewis and Teh, Yee and Gal, Yarin. “Simple
and scalable epistemic uncertainty estimation using a single deep determin-
istic neural network”. In: ICML. 2020.
[9] Arribas, Ignacio and Rodas-Jordá, Alberto and Toro-Rueda, Jose M del and
Parés, Ferran. “Evaluation of air traffic complexity with geometric metrics
and machine learning techniques”. In: IEEE Access 9 (2021), pp. 40894–
40905.

127
[10] Atencia, Miguel and Stoean, Ruxandra and Joya, Gonzalo. “Uncertainty
quantification through dropout in time series prediction by echo state net-
works”. In: Mathematics 8.8 (2020). issn: 2227–7390.
[11] Banavar, Sridhar and Sheth, Kapil S. and Grabbe, Shon. “Airspace complex-
ity and its application in air traffic management”. In: 2nd USA/EUROPE
Air traffic management R&D seminar (1998).
[12] Beck, Amir. First-order methods in optimization. SIAM, 2017.
[13] Behmardi, Behrouz and Raich, Raviv and Hero, Alfred O. “Entropy esti-
mation using the principle of maximum entropy”. In: 2011 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP).
2011, pp. 2008–2011.
[14] Bender, Steven R and Smith, Andrew E. “A geometric airspace complexity
metric for unmanned aerial systems”. In: Journal of Intelligent & Robotic
Systems 83.1 (2016), pp. 121–133.
[15] Bing, Dong. “Aircrafts monitoring and early warning based on air traffic
complexity measure analysis”. In: The Open Automation and Control Sys-
tems Journal 6 (2014), pp. 1563–1569.
[16] Bogert, Kenneth. Notes on Generalizing the Maximum Entropy Principle to
Uncertain Data. 2022. arXiv: 2109.04530 [[Link]].
[17] Boomsma, Wouter and Ferkinghoff-Borg, Jesper and Lindorff-Larsen, Kresten.
“Combining experiments and simulations using the maximum entropy prin-
ciple”. In: PLoS computational biology 10.2 (2014), e1003406.
[18] Borwein, Jonathan and Lewis, Adrian. Convex Analysis. Springer, 2006.
[19] Borwein, Jonathan M and Lewis, Adrian S. “Partially finite convex program-
ming, Part I: Quasi relative interiors and duality theory”. In: Mathematical
Programming 57.1 (1992), pp. 15–48.
[20] Borwein, Jonathan M and Lewis, Adrian S. “Partially finite convex program-
ming, part II: Explicit lattice models”. In: Mathematical Programming 57.1
(1992), pp. 49–83.
[21] Botea, Mariana and Ganesh, Arvind. “Prediction of air traffic complexity us-
ing machine learning algorithms”. In: Journal of Air Transport Management
52 (2016), pp. 11–21.
[22] Broyden, Charles George. “The convergence of a class of double-rank mini-
mization algorithms 1. general considerations”. In: IMA Journal of Applied
Mathematics 6.1 (1970), pp. 76–90.

128
[23] Burg, John Parker. “The relationship between maximum entropy spectra
and maximum likelihood spectra”. In: Geophysics 37.2 (1972), pp. 375–376.
[24] Cai, Guowei and Chen, Ben M and Lee, Tong Heng. “Coordinate systems
and transformations”. In: Unmanned rotorcraft systems (2011), pp. 23–34.
[25] Caticha, Ariel and Giffin, Adom. “Updating probabilities”. In: AIP Confer-
ence Proceedings. Vol. 872. 1. American Institute of Physics. 2006, pp. 31–
42.
[26] Caticha, Ariel and Mohammad-Djafari, Ali and Bercher, Jean-François and
Bessière, Pierre. “Entropic Inference”. In: AIP Conference Proceedings. AIP,
2011.
[27] Chen, Jeng-Shyang and Wu, Chen-Chien and Wang, Chien-Hua and Su, Yu-
Chih and Lee, Cheng-Che. “Agent-based approach for assessing unmanned
aircraft systems integration in the national airspace system”. In: IEEE Ac-
cess 5 (2017), pp. 11151–11160.
[28] Chen, Yunfei and Chen, Linlin and Yao, Wei and Sun, Jian. “A novel machine
learning approach to air traffic complexity prediction and management”. In:
Transportation Research Part C: Emerging Technologies 102 (2019), pp. 251–
266.
[29] Chen, Zhengmao and Guo, Dongyue and Lin, Yi. “A deep Gaussian process-
based flight trajectory prediction approach and Its application on conflict
detection”. In: Algorithms 13.11 (2020), p. 293. issn: 1999-4893.
[30] Cheng, Cheng et al. “Machine learning-aided trajectory prediction and con-
flict detection for internet of aerial vehicles”. In: IEEE Internet of Things
Journal (2021). issn: 2327-4662, 2372-2541.
[31] Chung, Junyoung and Gülçehre, Çaglar and Cho, KyungHyun and Bengio,
Yoshua. “Gated feedback recurrent neural networks”. In: CoRR abs/1502.02367
(2015). arXiv: 1502.02367.
[32] Coppola, Gianluca and Mandolini, Marco and Pieroni, Alessio. “Airspace
complexity assessment using topological methods”. In: IEEE Access 8 (2020),
pp. 75711–75720.
[33] Council, National Research and Engineering, Division on and Sciences, Phys-
ical and Aeronautics and Board, Space Engineering and Autonomy Research
for Civil Aviation, Committee on. “Autonomy research for civil aviation: To-
ward a new era of flight”. In: National Academies Press 2.3 (2014).
[34] De Leege, Arjen and Paassen, Marinus van and Mulder, Max. “A machine
learning approach to trajectory prediction”. In: AIAA Guidance, Navigation,
and Control (GNC) Conference. 2013, pp. 1–10.

129
[35] Degas, Augustin et al. “A survey on artificial intelligence (AI) and explain-
able ai in air traffic management: Current trends and development with
future research trajectory”. In: Applied Sciences 12.3 (2022), p. 1295. issn:
2076-3417.
[36] Delahaye, Daniel and García, Adrían and Lavandier, Julien and Chaimatanan,
Supatcha and Soler, Manuel. “Air traffic complexity map based on linear dy-
namical systems”. In: Aerospace 9.5 (2022). issn: 2226-4310.
[37] Delahaye, Daniel and Puechmorel, Stéphane. “Air traffic complexity: To-
wards an intrinsic metric”. In: Proceeding of the 3rd USA/Europe Air Traffic
Management R and D Seminar. 2000.
[38] Delahaye, Daniel and Puechmorel, Stéphane. Modeling and optimization of
air traffic. John Wiley & Sons, 2013. isbn: 9781848215955.
[39] Delahaye, Daniel and Puechmorel, Stephane. “Air traffic complexity based
on dynamical systems”. In: 49th IEEE Conference on Decision and Control
(CDC). Atlanta, GA, USA: IEEE, Dec. 2010, pp. 2069–2074. isbn: 978-1-
4244-7745-6.
[40] Dias, Fernando HC and Rey, David. “Robust aircraft conflict resolution un-
der trajectory prediction uncertainty”. In: Operations Research Letters 50.5
(2022), pp. 503–508.
[41] Ding, Jingtao et al. Artificial intelligence for complex network: Potential,
methodology and application. 2024. arXiv: 2402.16887 [[Link]].
[42] Dudík, Miroslav and Phillips, Steven J and Schapire, Robert E. “Maximum
entropy density estimation with generalized regularization and an applica-
tion to species distribution modeling”. In: Journal of Machine Learning Re-
search (2007), pp. 1217–1260.
[43] Facchinei, Francisco and Lucidi, Stefano and Palagi, Laura. “A truncated
Newton algorithm for large scale box constrained optimization”. In: SIAM
Journal on Optimization 12 (1999).
[44] Fang, Jianqiang and Li, Xiaoyan and Wei, Na and Lv, Xiaofei. “Agent-based
simulation of collaborative decision making for air traffic management”. In:
IEEE Access 7 (2019), pp. 135695–135707.
[45] Fu, Jun and Zhou, Wei and Chen, Zhibo. “Bayesian spatio-temporal graph
convolutional network for traffic forecasting”. In: arXiv:2010.07498 [cs] (2020).
arXiv: 2010.07498.
[46] Gao, Yi and Tang, Tao and Li, Xiaoyan and Liu, Fang. “Agent-based ap-
proach for modeling and simulation of air traffic management in en-route
airspace”. In: IEEE Access 7 (2019), pp. 61689–61699.

130
[47] Gao, Zhenyu and Yu, Yue and Wei, Qinshuang and Topcu, Ufuk and Clarke,
John-Paul. Noise-aware and equitable urban air traffic management: An op-
timization approach. 2024. arXiv: 2401.00806 [[Link]].
[48] Gatto, Riccardo. “Some computational aspects of the generalized von Mises
distribution”. In: Statistics and Computing 18 (2008), pp. 321–331.
[49] Gatto, Riccardo and Jammalamadaka, Sreenivasa Rao. “The generalized von
Mises distribution”. In: Statistical Methodology 4.3 (2007), pp. 341–353. issn:
1572-3127.
[50] Girardin, Valerie. “Méthodes de réalisation de produit scalaire et de prob-
lème de moments avec maximisation d’entropie”. In: Studia Mathematica
3.124 (1997), pp. 199–213.
[51] Gomes-Gonçalves, Erika and Gzyl, Henryk and Mayoral, Silvia. “Density
reconstructions with errors in the data”. In: Entropy 16.6 (2014), pp. 3257–
3272.
[52] Gresele, Luigi and Marsili, Matteo. “On maximum entropy and inference”.
In: Entropy 19.12 (2017), pp. 1–16.
[53] Gzyl, Henryk. Joint probabilities under expected value constraints, trans-
portation problems, maximum entropy in the mean, and geometry in the
space of probabilities. 2021. arXiv: 2109.01166 [[Link]].
[54] Hestenes, Magnus R. “The conjugate-gradient method for solving linear
systems1 ”. In: Numerical analysis 6 (1956), pp. 83–102.
[55] Hiriart-Urruty, Jean-Baptiste and Lemaréchal, Claude. Fundamentals of con-
vex analysis. Springer Science & Business Media, 2004.
[56] Hochreiter, Sepp and Schmidhuber, Jürgen. “Long short-term memory”. In:
Neural computation 9.8 (1997), pp. 1735–1780.
[57] Hong, Youkyung and Kim, Youdan and Lee, Keumjin. “Application of com-
plexity map to reduce air traffic complexity in a sector”. In: AIAA Guidance,
Navigation, and Control Conference. American Institute of Aeronautics and
Astronautics, 2014. isbn: 978-1-62410-317-9.
[58] Huang, Chiou-Jye and Kuo, Ping-Huan. “A deep CNN-LSTM model for
particulate matter (PM2.5) forecasting in smart cities”. In: Sensors 18.7
(2018).
[59] Huang, Jin and Ding, Weijie. “Aircraft trajectory prediction based on Baye-
sian optimised temporal convolutional network–bidirectional gated recurrent
unit hybrid neural nework”. In: International Journal of Aerospace Engineer-
ing 2022 (2022). Ed. by Sobel, Kenneth M., pp. 1–19.

131
[60] Jaynes, Edwin T. “Information theory and statistical mechanics”. In: Phys-
ical review 106.4 (1957), p. 620.
[61] Jaynes, Edwin T. Probability theory: The logic of science. Cambridge uni-
versity press, 2003.
[62] Jones, Donald R. “A taxonomy of global optimization methods based on
response surfaces”. In: Journal of global optimization 21 (2001), pp. 345–
383.
[63] Juntama, Paveen and Chaimatanan, Supatcha and Alam, Sameer and Dela-
haye, Daniel. “A distributed metaheuristic approach for complexity Rreduc-
tion in air traffic for strategic 4D trajectory optimization”. In: International
Conference on Artificial Intelligence and Data Analytics for Air Transporta-
tion. 2020, pp. 1–9.
[64] Kagan Abram, M. and Linnik Yuri, V. and Radhakrishna Rao, C. Charac-
terization problems in mathematical statistics. 1973.
[65] Khayyam, Hamid and Yildirim, Süleyman and Yücesan, Enver. “Compar-
ative evaluation of air traffic complexity metrics for en-route airspace”. In:
Transportation Research Part C: Emerging Technologies 119 (2020), p. 102725.
[66] Lakshmanan, Rajmadan and Pichler, Alois. “Soft Quantization Using En-
tropic Regularization”. In: Entropy 25.10 (2023), p. 1435.
[67] Laudeman I.; Shelden, S. “Dynamic density: An air traffic management met-
ric”. In: Technical Report, NASA/TM-1998-112226; NASA: Boulder, CO,
USA (1998).
[68] Lee, Keumjin and Feron, Eric and Pritchett, Amy. “Describing airspace com-
plexity: Airspace response to disturbances”. In: Journal of Guidance, Con-
trol, and Dynamics 32.1 (2009), pp. 210–222. issn: 0731-5090, 1533-3884.
[69] Li, Chengyu and Chen, Qian and Yan, Xiaoming and Wang, Jin and Lin,
Xingwei. “Learning air traffic as images: A deep convolutional neural network
for airspace operation complexity evaluation”. In: IEEE Access 8 (2020),
pp. 19732–19740.
[70] Li, Huanhuan and Jiao, Hang and Yang, Zaili. “AIS data-driven ship tra-
jectory prediction modelling and analysis based on machine learning and
deep learning methods”. In: Transportation Research Part E: Logistics and
Transportation Review 175 (July 2023), pp. 1–39. issn: 13665545.
[71] Li, Qiang and Guan, Xinjia and Liu, Jinpeng. “A CNN-LSTM framework
for flight delay prediction”. In: Expert Systems with Applications 227 (Oct.
2023), pp. 1–16. issn: 09574174.

132
[72] Li, Weigang and Alves, CJP and Omar, N. “Knowledge-based system for
air traffic flow management: Timetable rescheduling and centralized flow
control”. In: Transactions on Information and Communications Technologies
(1993), pp. 655–670.
[73] Liu, Hong and Lin, Yi and Chen, Zhengmao and Guo, Dongyue and Zhang,
Jianwei and Jing, Hailong. “Research on the air traffic flow prediction using
a deep learning approach”. In: IEEE Access 7 (2019), pp. 148019–148030.
[74] Liu, Hua and Li, Xiaoyan and Wei, Na and Lv, Xiaofei. “Agent-Based Simu-
lation of Aircraft Taxiing Operations”. In: IEEE Access 5 (2017), pp. 18936–
18945.
[75] Liu, Yan and Zhang, Jiazhong. “Predicting traffic flow in local area networks
by the largest Lyapunov exponent”. In: Entropy 18.1 (2016), p. 32.
[76] Ma, Lan and Tian, Shan. “A hybrid CNN-LSTM model for aircraft 4D
trajectory Prediction”. In: IEEE Access 8 (2020), pp. 134668–134680. issn:
2169-3536.
[77] Maksimov, Vyacheslav Mikhailovich. “Necessary and sufficient statistics for
the family of shifts of probability distributions on continuous BiCompact
groups”. In: Theory of Probability & Its Applications 12.2 (1967), pp. 267–
280.
[78] Maréchal, Pierre. “A note on entropy optimization”. In: Approximation, Op-
timization and Mathematical Economics. Springer, 2001, pp. 205–211.
[79] Maréchal, Pierre. “On the Principle of Maximum Entropy on the Mean as
a methodology for the regularization of inverse problems”. In: Probability
Theory and Mathematical Statistics (1998).
[80] Maréchal, Pierre and Navarrete, Yasmín and Davis, Sergio. “On the founda-
tions of the maximum entropy principle using Fenchel duality for Shannon
and Tsallis entropies”. In: Physica Scripta 99.7 (2024), p. 075265.
[81] Marwala, Tshilidzi and Nelwamondo, Fulufhelo V. “A multi-agent system
for air traffic flow management: A hybrid approach”. In: IEEE Transactions
on Intelligent Transportation Systems 22.7 (2020), pp. 4369–4381.
[82] Mirikitani, Derrick and Nikolaev, Nikolay. “Recursive Bayesian recurrent
neural networks for time-series modeling”. In: Transactions on Neural Net-
works 21.2 (2010), pp. 262–274.
[83] Mondoloni, Stephane and Liang, Diana. “Airspace Fractal Dimensions and
Applications”. In: USAEurope ATM Seminar. Dec. 2001.

133
[84] Moré, Jorge J. “Recent developments in algorithms and software for trust
region methods”. In: Mathematical Programming The State of the Art: Bonn
1982 (1983), pp. 258–287.
[85] Nelder, John A and Mead, Roger. “A simplex method for function minimiza-
tion”. In: The computer journal 7.4 (1965), pp. 308–313.
[86] Nghiem, Thi-Lich and Le, Viet-Duc and Le, Thi-Lan and Delahaye, Daniel
and Maréchal, Pierre. “Angular probability density reconstruction by Max-
imum Entropy”. preprint. July 2024.
[87] Pang, Yutian and Xu, Nan and Liu, Yongming. “Aircraft Trajectory Pre-
diction using LSTM neural network with embedded convolutional layer”.
In: Annual Conference of the PHM Society 11.1 (2019). issn: 2325-0178,
2325-0178.
[88] Pang, Yutian and Zhao, Xinyu and Hu, Jueming and Yan, Hao and Liu,
Yongming. “Bayesian Spatio-Temporal grAph tRansformer network (B-STAR)
for multi-aircraft trajectory prediction”. In: Knowledge-Based Systems 249
(Aug. 2022), p. 108998. issn: 09507051.
[89] Pérez Moreno, Francisco and Gómez Comendador, Víctor Fernando and
Delgado-Aguilera Jurado, Raquel and Zamarreño Suárez, María and Antulov-
Fantulin, Bruno and Arnaldo Valdés, Rosa María. “How has the concept of
air traffic complexity evolved? Review and analysis of the state of the art of
air traffic complexity”. In: Applied Sciences 14.9 (2024), pp. 1–21.
[90] Polpo, Adriano and Stern, Julio and Louzada, Francisco and Izbicki, Rafael
and Takada, Hellinton. “Bayesian inference and maximum entropy meth-
ods in science and engineering”. In: Springer Proceedings in Mathematics &
Statistics (2018).
[91] Powell, Michael JD. A direct search optimization method that models the
objective and constraint functions by linear interpolation. Springer, 1994.
[92] Powell, Michael JD. “An efficient method for finding the minimum of a func-
tion of several variables without calculating derivatives”. In: The computer
journal 7.2 (1964), pp. 155–162.
[93] Prandini, Maria and Piroddi, Luigi and Puechmorel, Stephane and Brazdilova,
Silvie Luisa. “Toward air traffic complexity assessment in new generation air
traffic management systems”. In: IEEE Transactions on Intelligent Trans-
portation Systems 12.3 (2011), pp. 809–818. issn: 1524-9050, 1558-0016.
[94] Qiao, Sun and Han, Nan and Zhu, Xin and Shu, Hong and Zheng, Jiao and
Yuan, Chang. “A dynamic trajectory prediction algorithm based on Kalman
Filter”. In: Acta Electronica Sinica 46 (2018), pp. 418–423.

134
[95] Qin, Wanting and Tang, Jun and Lao, Songyang. “DeepFR: A trajectory
prediction model based on deep feature representation”. In: Information Sci-
ences 604 (2022), pp. 226–248.
[96] Razvan, Bucuroiu and Stéphanie, Vincent. European Network Operations
Plan 2019 (2024) Rolling Seasonal Plan. Tech. rep. EUROCONTROL, 2019-
2024.
[97] Rényi, Alfréd. Probability theory. Courier Corporation, 2007.
[98] Rey, David and Rapine, Christophe and Fondacci, Remy and El Faouzi,
Nour-Eddin. “Subliminal speed control in air traffic management: Optimiza-
tion and simulation”. In: Transportation Science 50 (2016), pp. 240–262.
[99] Rivero, Cristian Rodriguez et al. “Time series forecasting using Recurrent
neural networks modified by Bayesian inference in the learning process”. In:
2019 IEEE Colombian Conference on Applications in Computational Intel-
ligence (ColCACI). Barranquilla, Colombia, 2019, pp. 1–6.
[100] Rockafellar, R Tyrrell. Convex analysis. Vol. 18. Princeton university press,
1970.
[101] Rockafellar, R Tyrrell. “Convex integral functionals and duality”. In: Con-
tributions to nonlinear functional analysis. Elsevier, 1971, pp. 215–236.
[102] Rockafellar, R Tyrrell. “Integral functionals, normal integrands and mea-
surable selections”. In: Nonlinear Operators and the Calculus of Variations.
Springer, 2006, pp. 157–207.
[103] Rockafellar, Ralph. “Integrals which are convex functionals”. In: Pacific
journal of mathematics 24.3 (1968), pp. 525–539.
[104] Rooij, Gijs de and Stienstra, Amber and Borst, Clark and Tisza, Adam B
and Paassen, Marinus M van and Mulder, Max. “Contributing factors to
flight-centric complexity in en-route air traffic control”. In: Proceedings of
the 15th USA/Europe Air Traffic Management Research and Development
Seminar. 2023.
[105] Salinas, David and Flunkert, Valentin and Gasthaus, Jan and Januschowski,
Tim. “DeepAR: Probabilistic forecasting with autoregressive recurrent net-
works”. In: International Journal of Forecasting 36.3 (2020), pp. 1181–1191.
issn: 0169-2070.
[106] Shafienya, Hesam and Regan, Amelia. “4D flight trajectory prediction based
on ADS-B data: A comparison of CNN-GRU models”. In: Aerospace Con-
ference (AERO). 2022, pp. 01–12.

135
[107] Shafienya, Hesam and Regan, Amelia C. “4D flight trajectory prediction
using a hybrid deep learning prediction method based on ADS-B technology:
A case study of Hartsfield–jackson Atlanta international airport (ATL)”. In:
Transportation Research Part C: Emerging Technologies 144 (2022), pp. 1–
12.
[108] Shang, Tongfei and Xiao, Kunpeng and Han, Kun. “Target operator tra-
jectory prediction method based on attention mechanism and LSTM”. In:
Journal of Physics: Conference Series 2037.1 (2021), pp. 1–6. issn: 1742-
6588.
[109] Shi, Xingjian and Chen, Zhourong and Wang, Hao and Yeung, Dit-Yan and
Wong, Wai-Kin and Woo, Wang-chun. “Convolutional LSTM network: A
machine learning approach for precipitation Nowcasting”. In: CoRR (2015).
[110] Shi, Xingjian and Chen, Zhourong and Wang, Hao and Yeung, Dit-Yan and
Wong, Wai-kin and Woo, Wang-chun. “Convolutional LSTM network: A
machine learning approach for precipitation nowcasting”. In: arXiv (2015).
eprint: 1506.04214 ([Link]).
[111] Shi, Zhiyuan and Xu, Min and Pan, Quan. “4D flight trajectory predic-
tion with constrained LSTM network”. In: IEEE Transactions on Intelligent
Transportation Systems 22.11 (2021), pp. 7242–7255. issn: 1524-9050.
[112] Siddique, Talha and Mahmud, Shaad and Keesee, Amy and Ngwira, Chigomezyo
and Connor, Hyunju. “A survey of uncertainty quantification in machine
learning for space weather prediction”. In: Geosciences 12.1 (2022). issn:
2076–3263.
[113] Spencer, D.A. “Applying artificial intelligence techniques to air traffic con-
trol automation”. In: The Lincoln Laboratory Journal 2.3 (1989), pp. 537–
554.
[114] Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V. “Sequence to sequence
learning with neural networks”. In: arXiv (2014). eprint: 1409.3215 ([Link]).
[115] Teboulle, Marc. “A simplified view of first order methods for optimization”.
In: Mathematical Programming 170.1 (2018), pp. 67–96.
[116] Tran, Phu N. and Nguyen, Hoang Q. V. and Pham, Duc-Thinh and Alam,
Sameer. “Aircraft trajectory prediction with enriched intent using Encoder-
Decoder architecture”. In: IEEE Access 10 (2022), pp. 17881–17896.
[117] Tsallis, Constantino and Mendes, Renio S and Plastino, Anel R. “The role of
constraints within generalized nonextensive statistics”. In: Physica A: Sta-
tistical Mechanics and its Applications 261.3-4 (1998), pp. 534–554.

136
[118] Varun S., Sudarsanan. “Deep learning framework for trajectory prediction
and in-time prognostics in the terminal airspace”. PhD thesis. Purdue Uni-
versity Graduate School, 2022.
[119] Vidosavljevic, A. and Delahaye, D. and Sunil, E. and Bussink, F. and Hoek-
stra, J. “Complexity analysis of the concepts of urban airspace design for
METROPOLIS project”. In: ENRI Int. Workshop on ATM/CNS. Tokyo,
Japan. 2015.
[120] Wan, Guihong and Maung, Crystal and Schweitzer, Haim. “Improving the
accuracy of principal component analysis by the maximum entropy method”.
In: 31st International Conference on Tools with Artificial Intelligence (IC-
TAI). IEEE. 2019, pp. 808–815.
[121] Wang, Chengzhang and He, Yu and Xu, Jun and Zhou, Chunyang. “A ma-
chine learning approach for air traffic complexity prediction and manage-
ment”. In: Journal of Advanced Transportation 2019 (2019), pp. 1–15.
[122] Wang, Huai-zhi and Li, Gang-qiang and Wang, Gui-bin and Peng, Jian-chun
and Jiang, Hui and Liu, Yi-tao. “Deep learning based ensemble approach for
probabilistic wind power forecasting”. In: Applied Energy 188 (2017), pp. 56–
70. issn: 0306-2619.
[123] Wang, Xing and Jiang, Xinhua and Chen, Lifei and Wu, Yi. “KVLMM: A
trajectory prediction method based on a variable-order Markov model with
kernel smoothing”. In: IEEE Access 6 (2018), pp. 25200–25208.
[124] Wang, Zhengyi and Delahaye, Daniel and Farges, Jean-Loup and Alam,
Sameer. “Complexity optimal air traffic assignment in multi-layer transport
network for urban air mobility operations”. In: Transportation Research Part
C: Emerging Technologies 142 (Sept. 2022), pp. 1–23. issn: 0968090X.
[125] Wang, Zhengyi and Liang, Man and Delahaye, Daniel. “Short-term 4D tra-
jectory prediction using machine learning methods”. In: SID 2017, 7th SESAR
Innovation Days. 2017.
[126] Wei, Na and Li, Xiaoyan and Lv, Xiaofei and Yang, Xiaodong. “Agent-
based simulation of air traffic flow management strategies”. In: IEEE Access
6 (2018), pp. 32216–32228.
[127] West, Jon and Zhang, Yu and Yildirim, Tolga and Blanks, Brenden. “Agent-
based simulation of air traffic control operations for small unmanned aerial
systems”. In: IEEE Access 7 (2019), pp. 56115–56125.

137
[128] Winter, Joost C de and Verhagen, Wim J and Ellerbroek, Joost. “Agent-
based simulation of collaborative decision making in air traffic flow manage-
ment”. In: IEEE Transactions on Intelligent Transportation Systems 19.9
(2018), pp. 2771–2782.
[129] Wyndemere. “An evaluation of air traffic control complexity”. In: Technical
Report, NASA 2-14284; NASA: Boulder, CO, USA (1996).
[130] Xin, Liu and Hailong, Pei and Jianqiang, Li. “Trajectory prediction based
on particle filter application in mobile robot system”. In: 2008 27th Chinese
Control Conference. 2008, pp. 389–393.
[131] Xu, Zhengfeng and Zeng, Weili and Chu, Xiao and Cao, Puwen. “Multi-
aircraft trajectory collaborative prediction based on social Long Short-Term
Memory network”. In: Aerospace 8.4 (2021). issn: 2226-4310.
[132] Yang, Hang and Liu, Zhonghua and Xiao, Yuguang. “Evaluation of air traffic
complexity using entropy-based metrics”. In: Entropy 21.10 (2019), p. 958.
[133] Yang, Yandong and Hong, Weijun and Li, Shufang. “Deep ensemble learning
based probabilistic load forecasting in smart grids”. In: Energy 189 (2019),
pp. 1–12.
[134] Zalinescu, Constantin. Convex analysis in general vector spaces. World sci-
entific, 2002.
[135] Zeng, Weili and Chu, Xiao and Xu, Zhengfeng and Liu, Yan and Quan,
Zhibin. “Aircraft 4D trajectory prediction in civil aviation: A review”. In:
Aerospace 9.2 (2022). issn: 2226-4310.
[136] Zeng, Weili and Chu, Xiao and Xu, Zhengfeng and Liu, Yan and Quan,
Zhibin. “Aircraft 4D trajectory prediction in civil aviation: A review”. In:
Aerospace 9.2 (Feb. 2022), pp. 1–19. issn: 2226-4310.
[137] Zhang, Rongjie and Wu, Wenjuan and Li, Liang and Li, Zhaowei. “Machine
learning techniques for air traffic complexity evaluation”. In: Transportation
Research Part C: Emerging Technologies 111 (2020), pp. 463–479.
[138] Zhang, Tao and Gao, Yang and Zhang, Chengwei. “Short-term 4D trajectory
prediction based on KF joint EKF parameter identification”. In: Journal of
Civil Aviation University of China 34.5 (2016), pp. 1–4.
[139] Zhang, Xiaoge and Mahadevan, Sankaran. “Bayesian neural networks for
flight trajectory prediction and safety assessment”. In: Decision Support Sys-
tems 131 (Apr. 2020), p. 113246. issn: 01679236.

138
Appendix
A Computing conjugates
In this appendix, we give computation details for the conjugate functions
involved in the Maximum Entropy problem of Chapter 3. We start with the
conjugate of
α
g◦ (η◦ , η) = − ∥z − η∥2Σ−1 − δ{1} (η◦ ).
2
We have:

(g◦ )⋆ (λ◦ , λ)
α
 
= ηinf,η ⟨(λ◦ , λ), (η◦ , η)⟩ + ∥z − η∥2Σ−1 + δ{1} (η◦ )
◦ 2 
α
n o 
= inf λ◦ η◦ + δ{1} (η◦ ) + inf ⟨λ, η⟩ + ∥z − η∥Σ−1 . 2
η◦ η 2
Clearly, the first infimum is attained at η̄◦ = 1, yielding the value λ◦ . The
second infimum is attained at the point η̄ where the gradient of the function
η 7→ ⟨λ, η⟩ + α2 ∥z − η∥2Σ−1 . A straightforward computation shows that the
gradient of the latter function is equal to λ − αΣ−1 (z − η), which implies
that
1
η̄ = z − Σλ.
α
It follows that
α
 
inf ⟨λ, η⟩ + ∥z − η∥2Σ−1
η 2
1 α 1 1
    
= λ, z − Σλ + Σλ, Σ−1 Σλ
α 2 α α
1
= ⟨λ, z⟩ − ⟨λ, Σλ⟩ .

Putting things together, we obtain:
1
(g◦ )⋆ (λ◦ , λ) = λ◦ + ⟨λ, z⟩ − ⟨λ, Σλ⟩ .

Next, we address the case where the estimated covariance matrix is sin-
gular. In this case, the inverse covariance matrix does not exist and the
Mahalanobis distance degenerates. We check here that the appropriate for-
mula is that given in Equation (3.6).

Lemma 24. Let Q be a symmetric positive semi-definite matrix and let


1
φ(x) := ⟨x, Qx⟩
2
Then, the convex conjugate φ⋆ is given by
( D E
ξ, Q† ξ if ξ ∈ ran Q,
φ (ξ) =

∞ otherwise.

Proof. By definition,
1
 
φ (ξ) = sup ⟨ξ, x⟩ − ⟨ξ, Qx⟩ .

x 2
The gradient of the function to be supremized is equal to ξ−Qx. If ξ ∈ ran Q,
then the gradient vanishes at any x̄ such that Qx̄ = ξ, for example at
x̄ = Q† ξ, since QQ† ξ = ξ (recall that QQ† is the orthonormal projection
onto the range of Q. Then,
D E 1D † E 1D E
φ⋆ (ξ) = ξ, Q† ξ − Q ξ, QQ† ξ = ξ, Q† ξ .
2 2
If ξ ∈
/ ran Q, then pick x◦ ∈ ker Q \ {0}. For every t ∈ R,

1
⟨ξ, tx◦ ⟩ − ⟨ξ, Q(tx◦ )⟩ = t ⟨ξ, x◦ ⟩ . (6.1)
2
Since x◦ ∈ ker Q = (ran Q⋆ )⊥ = (ran Q)⊥ and we assumed that ξ ∈ / ran Q,
the scalar product ⟨ξ, x◦ ⟩ is necessarily nonzero, which shows that the left
hand side in Equation (6.1) can be made as large as desired. This implies
that φ⋆ (ξ) = ∞ in this case.

A byproduct of the lemma is that the function in Equation (3.6) is a


closed proper convex. Since such functions are equal to their biconjugates,
this justifies that (g◦ )⋆ is unchanged, and subsequently so are the dual prob-
lems (D) and (D̃).
Finally, we deal with the computation of the convex conjugate of the
function
 t ln t if t > 0,


h◦ (t) = 0 if t = 0,
∞ t < 0.

By definition,
h⋆ (τ ) = sup{τ t − h(t)}.
t∈R

Suppose that the supremum is attained at t̄ ∈ R∗+ . Then

τ − h′ (t̄) = τ − ln t̄ − 1 = 0,

so that t̄ = eτ −1 . We see a posterior that t̄ maximizes the function x 7→


τ t − h(t), since the maximum can’t be attained on R− . Therefore,

h⋆ (τ ) = τ t̄ − h(t̄) = τ eτ −1 − eτ −1 (τ − 1) = eτ −1 .

Vous aimerez peut-être aussi