Papers by Abdelkader Baggag
Deep Reinforcement Learning for Traffic Light Optimization
Deep Reinforcement Learning has the potential of practically addressing one of the most pressing ... more Deep Reinforcement Learning has the potential of practically addressing one of the most pressing problems in road traffic management, namely that of traffic light optimization (TLO). The objective of the TLO problem is to set the timings (phase and duration) of traffic lights in order to minimize the overall travel time of the vehicles that traverse the road network. In this paper, we introduce a new reward function that is able to decrease travel time in a micro-simulator environment. More specifically, our reward function simultaneously takes the traffic flow and traffic delay into account in order to provide a solution to the TLO problem. We use both Deep Q-Learning and Policy Gradient approaches to solve the resulting reinforcement learning problem.

Bioinformatics
Motivation Finding outliers in RNA-sequencing (RNA-Seq) gene expression (GE) can help in identify... more Motivation Finding outliers in RNA-sequencing (RNA-Seq) gene expression (GE) can help in identifying genes that are aberrant and cause Mendelian disorders. Recently developed models for this task rely on modeling RNA-Seq GE data using the negative binomial distribution (NBD). However, some of those models either rely on procedures for inferring NBD’s parameters in a nonbiased way that are computationally demanding and thus make confounder control challenging, while others rely on less computationally demanding but biased procedures and convoluted confounder control approaches that hinder interpretability. Results In this article, we present OutSingle (Outlier detection using Singular Value Decomposition), an almost instantaneous way of detecting outliers in RNA-Seq GE data. It uses a simple log-normal approach for count modeling. For confounder control, it uses the recently discovered optimal hard threshold (OHT) method for noise detection, which itself is based on singular value de...
ClassMat: a Matrix of Small Multiples to Analyze the Topology of Multiclass Multidimensional Data
2022 Topological Data Analysis and Visualization (TopoInVis)

Data Mining and Knowledge Discovery
Network proximity computations are among the most common operations in various data mining applic... more Network proximity computations are among the most common operations in various data mining applications, including link prediction and collaborative filtering. A common measure of network proximity is Katz index, which has been shown to be among the best-performing path-based link prediction algorithms. With the emergence of very large network databases, such proximity computations become an important part of query processing in these databases. Consequently, significant effort has been devoted to developing algorithms for efficient computation of Katz index between a given pair of nodes or between a query node and every other node in the network. Here, we present LRC-Katz, an algorithm based on indexing and low rank correction to accelerate Katz index based network proximity queries. Using a variety of very large real-world networks, we show that LRC-Katz outperforms the fastest existing method, Conjugate Gradient, for a wide range of parameter values. We also show that, this acceleration in the computation of Katz index
Deep Reinforcement Learning for Traffic Light Optimization
2018 IEEE International Conference on Data Mining Workshops (ICDMW)
Deep Reinforcement Learning has the potential of practically addressing one of the most pressing ... more Deep Reinforcement Learning has the potential of practically addressing one of the most pressing problems in road traffic management, namely that of traffic light optimization (TLO). The objective of the TLO problem is to set the timings (phase and duration) of traffic lights in order to minimize the overall travel time of the vehicles that traverse the road network. In this paper, we introduce a new reward function that is able to decrease travel time in a micro-simulator environment. More specifically, our reward function simultaneously takes the traffic flow and traffic delay into account in order to provide a solution to the TLO problem. We use both Deep Q-Learning and Policy Gradient approaches to solve the resulting reinforcement learning problem.

Operated by Universities Space Research Association
Abstract. We consider propagation of disturbances in a non-uniform mean ow by high-order numerica... more Abstract. We consider propagation of disturbances in a non-uniform mean ow by high-order numerical simulation. Monopole and dipole acoustic, vortical and entropy pulses are embedded in an incompressible stagnation ow, which is taken as a prototype of a non-uniform low Mach number mean ow near a rigid wall at high angle of attack. Numerical results are discussed in terms of baroclinic generation of disturbance vorticity that appear to be a key process in energy transfer between a non-uniform mean ow and a propagating disturbance. These phenomena lead to ampli cation of sound waves originated from an acoustic pulse. Vorticity generation governs wave radiation of a near-wall entropy pulse and makes the radiated waves similar to those from a vortical dipole. Interaction of initial pulse vorticity with generated vorticity leads to various radiated wave patterns discussed here. Key words. aeroacoustics of non-uniform ows, stagnation ow, wave ampli cation, vortical ows, monopole and dipole...

Toward Perception-Based Evaluation of Clustering Techniques for Visual Analytics
2019 IEEE Visualization Conference (VIS), 2019
Automatic clustering techniques play a central role in Visual Analytics by helping analysts to di... more Automatic clustering techniques play a central role in Visual Analytics by helping analysts to discover interesting patterns in high-dimensional data. Evaluating these clustering techniques, however, is difficult due to the lack of universal ground truth. Instead, clustering approaches are usually evaluated based on a subjective visual judgment of low-dimensional scatterplots of different datasets. As clustering is an inherent human-in-the-loop task, we propose a more systematic way of evaluating clustering algorithms based on quantification of human perception of clusters in 2D scatterplots. The core question we are asking is in how far existing clustering techniques align with clusters perceived by humans. To do so, we build on a dataset from a previous study [1], in which 34 human subjects la-beled 1000 synthetic scatterplots in terms of whether they could see one or more than one cluster. Here, we use this dataset to benchmark state-of-the-art clustering techniques in terms of how far they agree with these human judgments. More specifically, we assess 1437 variants of K-means, Gaussian Mixture Models, CLIQUE, DBSCAN, and Agglomerative Clustering techniques on these benchmarks data. We get unexpected results. For instance, CLIQUE and DBSCAN are at best in slight agreement on this basic cluster counting task, while model-agnostic Agglomerative clustering can be up to a substantial agreement with human subjects depending on the variants. We discuss how to extend this perception-based clustering benchmark approach, and how it could lead to the design of perception-based clustering techniques that would better support more trustworthy and explainable models of cluster patterns.

ArXiv, 2021
Abstract—Visual quality measures (VQMs) are designed to support analysts by automatically detecti... more Abstract—Visual quality measures (VQMs) are designed to support analysts by automatically detecting and quantifying patterns in visualizations. We propose a new data-driven technique called ClustRank that allows to rank scatterplots according to visible grouping patterns. Our model first encodes scatterplots in the parametric space of a Gaussian Mixture Model, and then uses a classifier trained on human judgment data to estimate the perceptual complexity of grouping patterns. The numbers of initial mixture components and final combined groups determine the rank of the scatterplot. ClustRank improves on existing VQM techniques by mimicking human judgments on two-Gaussian cluster patterns, and gives more accuracy when ranking general cluster patterns in scatterplots. We demonstrate its benefit by analyzing kinship data for genome-wide association studies, a domain in which experts rely on the visual analysis of large sets of scatterplots. We make the three benchmark datasets and the C...

In an effort to curb air pollution, the city of Delhi (India), known to be one of the most popula... more In an effort to curb air pollution, the city of Delhi (India), known to be one of the most populated, polluted, and congested cities in the world has run a trial experiment in two phases of 15 days intervals. During the experiment, most of four-wheeled vehicles were constrained to move on alternate days based on whether their plate numbers ended with odd or even digits. While the local government of Delhi represented by A. Kejriwal (leader of AAP party) advocated for the benefits of the experiment, the prime minister of India, N. Modi (former leader of BJP) defended the inefficiency of the initiative. This later has led to a strong polarization of public opinion towards OddEven experiment. This real-world urban experiment provided the scientific community with a unique opportunity to study the impact of political leaning on humans perception at a large-scale. We collect data about pollution and traffic congestion to measure the real effectiveness of the experiment. We use Twitter to...

Advanced Computation of Sparse Precision Matrices for Big Data
The precision matrix is the inverse of the covariance matrix. Estimating large sparse precision m... more The precision matrix is the inverse of the covariance matrix. Estimating large sparse precision matrices is an interesting and a challenging problem in many fields of sciences, engineering, humanities and machine learning problems in general. Recent applications often encounter high dimensionality with a limited number of data points leading to a number of covariance parameters that greatly exceeds the number of observations, and hence the singularity of the covariance matrix. Several methods have been proposed to deal with this challenging problem, but there is no guarantee that the obtained estimator is positive definite. Furthermore, in many cases, one needs to capture some additional information on the setting of the problem. In this paper, we introduce a criterion that ensures the positive definiteness of the precision matrix and we propose the inner-outer alternating direction method of multipliers as an efficient method for estimating it. We show that the convergence of the a...
Stream Experiments: Toward Latency Hiding in GPGPU
Parallel and Distributed Computing and Networks, 2010

Gas Particles Modeling and Simulation
The two-phase gas-particles flows are encountered in many engineering applications, including, ri... more The two-phase gas-particles flows are encountered in many engineering applications, including, risers and downers, pneumatic conveying systems, particle transport and erosion, deposition in turbo-machinery and so on. There are numerous questions arising when modelling such particles, as how effective is the contacting between gas and particles? how fast is the radial dispersion?, or how do these issues change with scale up?. Prediction of particle motion, deposition rate and interaction with the fluid is crucial to design cost- effective industrial processes. Many relevant aspects still under research and don't given full satisfaction yet, among them particle drag forces formulation, a suitable turbulent models, particle volume fraction boundary conditions including rebounding wall condition, coupling with the Navier- Stokes equations and a stable and robust numerical discretization. Flow is invariably unsteady with a wide range of length and time scales as particles are continu...
Health Lifestyle Data-Driven Applications Using Pervasive Computing
Big Data, Big Challenges: A Healthcare Perspective, 2019
In this chapter, we overview the current and future impact of pervasive computing in the health d... more In this chapter, we overview the current and future impact of pervasive computing in the health domain. In this context, we focus on some of the crucial aspects of data-driven applications. We present examples of recently proposed lifestyle applications and highlight the ethical issues with such applications. We discuss challenges and opportunities in the process of transforming the raw data collected from wearables and mobile devices into insights. Finally, the last part of this chapter provides insights into socio-ethical aspects which are raising in the context of data-driven health applications based on pervasive computing technologies.

PROSPECT: A web server for predicting protein histidine phosphorylation sites
Journal of Bioinformatics and Computational Biology, 2020
Background: Phosphorylation of histidine residues plays crucial roles in signaling pathways and c... more Background: Phosphorylation of histidine residues plays crucial roles in signaling pathways and cell metabolism in prokaryotes such as bacteria. While evidence has emerged that protein histidine phosphorylation also occurs in more complex organisms, its role in mammalian cells has remained largely uncharted. Thus, it is highly desirable to develop computational tools that are able to identify histidine phosphorylation sites. Result: Here, we introduce PROSPECT that enables fast and accurate prediction of proteome-wide histidine phosphorylation substrates and sites. Our tool is based on a hybrid method that integrates the outputs of two convolutional neural network (CNN)-based classifiers and a random forest-based classifier. Three features, including the one-of-K coding, enhanced grouped amino acids content (EGAAC) and composition of k-spaced amino acid group pairs (CKSAAGP) encoding, were taken as the input to three classifiers, respectively. Our results show that it is able to acc...

Entropy, 2020
A restricted Boltzmann machine is a generative probabilistic graphic network. A probability of fi... more A restricted Boltzmann machine is a generative probabilistic graphic network. A probability of finding the network in a certain configuration is given by the Boltzmann distribution. Given training data, its learning is done by optimizing the parameters of the energy function of the network. In this paper, we analyze the training process of the restricted Boltzmann machine in the context of statistical physics. As an illustration, for small size bar-and-stripe patterns, we calculate thermodynamic quantities such as entropy, free energy, and internal energy as a function of the training epoch. We demonstrate the growth of the correlation between the visible and hidden layers via the subadditivity of entropies as the training proceeds. Using the Monte-Carlo simulation of trajectories of the visible and hidden vectors in the configuration space, we also calculate the distribution of the work done on the restricted Boltzmann machine by switching the parameters of the energy function. We ...

Deep Learning for Traffic Analytics Application FIFA2022
Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 3, 2018
As urban data keeps getting bigger, deep learning is coming to play a key role in providing big d... more As urban data keeps getting bigger, deep learning is coming to play a key role in providing big data predictive analytics solutions. We are interested in developing a new generation of deep learning based computational technologies that predict traffic congestion and crowd management. In this work, we are mainly interested in efficiently predicting future traffic with high accuracy. The proposed deep learning solution allows the revealing of the latent (hidden) structure common to different cities in terms of dynamics. The data driven insights of traffic analytics will help shareholders, e.g., security forces, stadium management teams, and travel agencies, to take fast and reliable decisions to deliver the best possible experience for visitors. Current traffic data sources in Qatar are incomplete as sensors are not yet permanently deployed for data collection.The following topics are being addressed:Predictive Crowd and Vehicles Traffic Analytics: Forecasting the flow of crowds and vehicles is of great importance to traffic management, risk assessment and public safety. It is affected by many complex factors, including spatial and temporal dependencies, infrastructure constraints and external conditions (e.g. weather and events). If one can predict the flow of crowds and vehicles in a region, tragedies can be mitigated or prevented by utilizing emergency mechanisms, such as conducting traffic control, sending out warnings, signaling diversion routes or evacuating people, in advance. We propose a deep-learning-based approach to collectively forecast the flow of crowds and vehicles. Deep models, such as Deep-Neural-Networks, are currently the best data-driven techniques to handle heterogeneous data and to discover and predict complex data patterns such as traffic congestion and crowd movements. We will focus in particular on predicting inflow and outflow of crowds or vehicles to and from important areas, tracking the transitions between these regions. We will study different deep architectures to increase the accuracy of the predictive model, and explore ways on how to integrate spatio-temporal information into these models. We will also study how deep models can be re-used without retraining to handle new data and better scale to large data sets. What-If Scenarios Modeling: Understanding how congestion or overcrowd at one location can cause ripples throughout a transportation network is vital to pinpoint traffic bottlenecks for congestion mitigation or emergency response preparation. We will use predictive modeling to simulate different states of the transportation network enabling the stakeholder to test different hypotheses in advance. We will use the theory of multi-layer networks to model and then simulate the complex relationship between different but coexisting types of flows (crowd, vehicles) and infrastructures (roads, railways, crossings, passageways, squares…). We will propose a visual analytic platform that will provide necessary visual handles to generate different cases, navigate through different scenarios, and identify potential bottleneck, weak points and resilient routes. This visualization platform connected to the real-time predictive analytic platform will allow supporting stakeholder decision by automatically matching the current situation to the already explored scenarios and possible emergency plans. Safety and Evacuation Planning based on Resilience Analytics: Determining the best route to clear congestion or overcrowded areas, or new routes to divert traffic and people from such areas is crucial to maintain high security and safety levels. The visual analytic platform and the predictive model will enable the test and set up of safety and evacuation plans to be applied in case of upcoming emergency as detected by the predictive analytic platform. Overall, the proposed approach is independent of the type of flows, i.e., vehicles or people, or infrastructures, as long as proper sensors (magnetic loops, video camera, GPS tracking, etc.) provide relevant data about these flows (number of people or vehicles per time unit along a route of some layer of the transportation network). The proposed data-driven learning models are efficient, and they adapt to the specificities of the type of flows, by updating the relevant parameters during the training phase.

Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 3, 2018
Spatiotemporal data related to traffic has become common place due to the wide availability of ch... more Spatiotemporal data related to traffic has become common place due to the wide availability of cheap sensors and the rapid deployment of IoT platforms. Yet, this data suffer several challenges related to sparsity, incompleteness, and noise, which makes traffic analytics difficult. In this paper, we investigate the problem of missing data or noisy information in the context of real-time monitoring and forecasting of traffic congestion for road networks. The road network is represented as a directed graph in which nodes are junctions and edges are road segments. We assume that the city has deployed high-fidelity sensors for speed reading in a subset of edges. Our objective is to infer speed readings for the remaining edges in the network as well as missing values to malfunctioning sensors. We propose a tensor representation for the series of road network snapshots, and develop a regularized factorization method to estimate the missing values, while learning the latent factors of the n...
Fluid Mechanics research International Journal, 2018
The Marangoni effect is a very important phenomenon happening at an interface between two immisci... more The Marangoni effect is a very important phenomenon happening at an interface between two immiscible fluids creating a source of convection. This effect is very important in two phase flow problems. Unfortunately, the Marangoni effect is neglected by many studies in two phase fluid flow and is still considered a challenging problem. A mathematical model has been developed in this paper showing the Marangoni effect in the case of two immiscible fluids in Navier-Stokes equation. The mathematical translation of the convection term at the interface is developed in detail from the starting point of physical parameters using powerful mathematical tools.

EPJ Data Science, 2018
A multi-modal transportation system of a city can be modeled as a multiplex network with differen... more A multi-modal transportation system of a city can be modeled as a multiplex network with different layers corresponding to different transportation modes. These layers include, but are not limited to, bus network, metro network, and road network. Formally, a multiplex network is a multilayer graph in which the same set of nodes are connected by different types of relationships. Intra-layer relationships denote the road segments connecting stations of the same transportation mode, whereas inter-layer relationships represent connections between different transportation modes within the same station. Given a multi-modal transportation system of a city, we are interested in assessing its quality or efficiency by estimating the coverage i.e., a portion of the city that can be covered by a random walker who navigates through it within a given time budget, or steps. We are also interested in the robustness of the whole transportation system which denotes the degree to which the system is able to withstand a random or targeted failure affecting one or more parts of it. Previous approaches proposed a mathematical framework to numerically compute the coverage in multiplex networks. However solutions are usually based on eigenvalue decomposition, known to be time consuming and hard to obtain in the case of large systems. In this work, we propose MUME, an efficient algorithm for Multi-modal Urban Mobility Estimation, that takes advantage of the special structure of the supra-Laplacian matrix of the transportation multiplex, to compute the coverage of the system. We conduct a comprehensive series of experiments to demonstrate the effectiveness and efficiency of MUME on both synthetic and real transportation networks of various cities such as Paris, London, New York and Chicago. A future goal is to use this experience to make projections for a fast growing city like Doha.

A Multiplex Approach to Urban Mobility
Studies in Computational Intelligence, 2016
Multilayer networks have been the subject of intense research in the recent years in different ap... more Multilayer networks have been the subject of intense research in the recent years in different applications. However, in urban mobility, the multi-layer nature of transportation systems has been generally ignored, even though most large cities are spanned by more than one transportation system. These different modes of transport have usually been studied separately. It is however important to understand the interplay between different transport modes. In this study, we consider the multimodal transportation system, represented as a multiplex network, and we address the problem of urban mobility in the transportation system, in addition to its robustness and resilience under random and targeted failures. Multiplex networks are formed by a set of nodes connected by links having different relationships forming the different layers of the multiplex. We study, in particular, how random and targeted failures to the transportation multiplex network affect the way people travel in the city. More specifically, we are interested in assessing the portion of the city covered by a random walker under various scenarios. We consider the public transport of London as an application to illustrate the proposed capacity analysis method of multi-modal transportation, and we report on the robustness and the resilience of the system. This study is part of a project to develop a computational framework to better understand and predict mobility patterns in the city of Doha once its ambitious metro system is deployed in 2019. The computational framework will help the city to efficiently manage the flow of people and intelligently handle capacity through different transportation modes, in particular during mega events such as Soccer Wold cup FIFA 2022. The proposed method is based on the study in [9], but with an efficient computational approach resulting in tremendous savings in computational time. It is scalable and lends itself to efficient implementation on parallel computers.
Uploads
Papers by Abdelkader Baggag