Papers by Vana Kalogeraki

The lack of parking spaces in large urban cities is responsible for a series of problems such as ... more The lack of parking spaces in large urban cities is responsible for a series of problems such as traffic congestion, air pollution and social anxiety. A promising approach to alleviate those effects is harnessing contributions from the human crowd equipped with mobile phones to find available and affordable parking spaces. In this work we propose a crowdsourcing system that aims to find the most suitable parking options for users in a smart city. We have developed ParkMatch, our algorithm deployed in our crowdsourcing system, that unlike existing approaches where a large unfiltered number of parking possibilities is given to the users, it provides the most appropriate set of results suitable to the user needs. Through experimental evaluation in our simulation model, we show the effectiveness and benefits of our approach. CCS CONCEPTS •Human-centered computing →Ubiquitous and mobile computing systems and tools;

Nowadays distributed processing frameworks like Apache Spark have been successfully used for the ... more Nowadays distributed processing frameworks like Apache Spark have been successfully used for the execution of big data applications. Despite their wide adoption little work has been done in terms of controlling the applications' energy consumption. Datacenters contribute over 2 % of the total US electric usage therefore minimizing the energy utilization of Spark application can be extremely helpful. Solving this energy consumption problem requires the scheduling of Spark applications in an energy-efficient way. However, the problem is challenging as we also have to consider application performance requirements. In this work, we provide the overview of a novel framework that orchestrates the execution order of Spark applications, exploiting DVFS to tune the computing nodes CPU frequencies in order to minimize the energy consumption and satisfy application's performance requirements. Our early experimental results illustrate the working and benefits of our framework.

Pub/sub systems have been widely utilized in the industry as the connecting piece between high ra... more Pub/sub systems have been widely utilized in the industry as the connecting piece between high rate producers or mobile clients and end-services, due to their ability to handle messages of high volume and velocity and achieve high throughput. Apache Kafka is one of the most popular Big Data messaging systems. Although Kafka’s modular Consumer API allows businesses to take advantage of the rich set of features it provides, it often suffers from improper configured topics/partitions or load imbalances due to spikes or high volume of messages injected in the same topic. In our research our goal is to develop a novel framework which investigates smart queuing mechanisms to overcome these issues and deal effectively with sudden bursts and overloads which are frequently experienced in these systems. Our approach provides rate control capabilities to the Kafka’s consumer API which allows us to effectively meet the requirements of different end-services without interference among them.
IEEE Conference Proceedings, 2016

In recent years, sensing systems in urban environments are being replaced by Unmanned Aerial Vehi... more In recent years, sensing systems in urban environments are being replaced by Unmanned Aerial Vehicles (UAVs). UAVs, also known as drones, have shown great potential in executing different kinds of sensing missions, such as search and rescue, object tracking, inspection, etc. The UAVs' sensing capabilities and their agile mobility can replace existing complex solutions for such missions. However, coordinating a swarm of drones for mission accomplishment is not a trivial task. Existing works in the literature focus solely on managing the swarm and do not provide options for automating entire missions. In this paper, we present PaROS (PROgramming Swarm), a novel framework for programming a swarm of UAVs. PaROS provides a set of programming primitives for orchestrating a swarm of drones along with automating certain types of missions. These primitives, referred as abstract swarms, control every drone in the swarm, hiding the complexity of low level details from a programmer such as assigning flight plans, task partitioning, failure recovery and area division. Our experimental evaluation proves that our approach is stable, time-efficient and practical.

In a networked world, events are transmitted from multiple distributed sources into CEP systems, ... more In a networked world, events are transmitted from multiple distributed sources into CEP systems, where events are related to one another along multiple dimensions, e.g., temporal and spatial, to create complex events. The big data era brought with it an increase in the scale and frequency of event reporting. Internet of Things adds another layer of complexity with multiple, continuously changing event sources, not all of which are perfectly reliable, often suffering from late arrivals. In this work we propose a probabilistic model to deal with the problem of reduced reliability of event arrival time. We use statistical theories to fit the distributions of inter-generation at the source and network delays per event type. Equipped with these distributions we propose a predictive method for determining whether an event belonging to a window has yet to arrive. Given some user-defined tolerance levels (on quality and timeliness), we propose an algorithm for dynamically determining the amount of time a complex event time-window should remain open. Using a thorough empirical analysis, we compare the proposed algorithm against state-of-the-art mechanisms for delayed arrival of events and show the superiority of our proposed method.

The problem of coping with the demands of determinism and meeting latency constraints is challeng... more The problem of coping with the demands of determinism and meeting latency constraints is challenging in distributed data stream processing systems that have to process high volume data streams that arrive from di erent unsynchronized input sources. In order to deterministically process the streaming data, they need mechanisms that synchronize the order in which tuples are processed by the operators. On the other hand, achieving real-time response in such a system requires careful tradeo between determinism and low latency performance. We build on a recently proposed approach to handle data exchange and synchronization in stream processing, namely ScaleGate, which comes with guarantees for determinism and an e cient lock-free implementation, enabling high scalability. Considering the challenge and trade-o s implied by real-time constraints, we propose a system which comprises (a) a novel data structure called Slack-ScaleGate (SSG), along with its algorithmic implementation; SSG enables us to guarantee the deterministic processing of tuples as long as they are able to meet their latency constraints, and (b) a method to dynamically tune the maximum amount of time that a tuple can wait in the SSG datastructure, relaxing the determinism guarantees when needed, in order to satisfy the latency constraints. Our detailed experimental evaluation using a tra c monitoring application deployed in the city of Dublin, illustrates the working and bene ts of our approach.
IEEE Conference Proceedings, 2016
ABSTRACT The modern world is increasingly dependent on software, and yet the software we use is o... more ABSTRACT The modern world is increasingly dependent on software, and yet the software we use is often manifestly insecure and unreliable. It seems only a matter of time until a "cyber-Pearl harbor" occurs. As a result, we can expect future systems to be programmed ...
In this paper we study the problem of complete path coverage planning for a set of Unmanned Aeria... more In this paper we study the problem of complete path coverage planning for a set of Unmanned Aerial Vehicles (UAVs) in urban environments. The geographical area we aim to cover is represented as a grid of cells with no holes and the center of every cell in this grid represents a node. Thus, the problem we solve is: Given a geographical area to be explored by a set of UAVs, how to plan a path that ensures that all nodes in the given area are covered while minimizing the distance traveled by the UAVs. We propose an algorithm that determines the complete coverage path, this creates a path for exploration that every node in the path will be visited exactly once while minimizing the total distance traveled by the UAV. We illustrate that our approach can also be applied in the case of multiple UAVs that can fly simultaneously over that area thus minimizing the exploration time.

In this demonstration we present Dione a novel framework for automatic profiling and tuning big d... more In this demonstration we present Dione a novel framework for automatic profiling and tuning big data applications. Our system allows a non-expert user to submit Spark or Flink applications to his/her cluster and Dione automatically determines the impact of different configuration parameters on the application's execution time and monetary cost. Dione is the first framework that exploits similarities in the execution plans of different applications to narrow down the amount of profiling runs that are required for building prediction models that capture the impact of the configuration parameters on the metrics of interest. Dione exploits these prediction models to tune the configuration parameters in a way that minimizes the application's execution time or the user's budget. Finally, Dione's Web-UI visualizes the impact of the configuration parameters on the execution time and the monetary cost, and enables the user to submit the application with the recommended parameters' values.

Social networks have become the de facto online resource for people to share, comment on and be i... more Social networks have become the de facto online resource for people to share, comment on and be informed about events pertinent to their interests and livelihood, ranging from road traffic or an illness to concerts and earthquakes, to economics and politics. This has been the driving force behind research endeavors that analyse such data. In this paper, we focus on how Content Networks can help us identify events effectively. Content Networks incorporate both structural and content-related information of a social network in a unified way, at the same time, bringing together two disparate lines of research: graph-based and content-based event discovery in social media. We model interactions of two types of nodes, users and content, and introduce an algorithm that builds heterogeneous, dynamic graphs, in addition to revealing content links in the network's structure. By linking similar content nodes and tracking connected components over time, we can effectively identify different types of events. Our evaluation on social media streaming data suggests that our approach outperforms state-of-the-art techniques, while showcasing the significance of hidden links to the quality of the results.

Springer eBooks, 2005
Wireless sensor networks are emerging as a new computational platform consisting of small, low-po... more Wireless sensor networks are emerging as a new computational platform consisting of small, low-power and inexpensive nodes used in a broad set of application areas including environmental monitoring, habitat monitoring and disaster recovery. Typically sensor nodes are deployed over a geographical area for the purpose of detecting, tracking and monitoring events of interest. Since sensor nodes are deployed in a large land region, the objective is to achieve complete coverage of the region, that is, every location in the region lies in the observation field of at least one sensor node. However the initial placement of sensors may not achieve this goal for various reasons: the number of original sensors may have been too low, the original placement may have been random (for example, sensors deployed from the air) leaving parts of the region uncovered, or, some of the sensors have malfunctioned, leaving coverage holes. In this paper we consider the coverage restoration problem in sensor networks. The goal is to find a minimal set of new sensors, and their locations, such that when they are added to an existing sensor field we can achieve complete coverage of the region under surveillance. The technique we propose is distributed, and minimizes the communication costs. The key idea of our technique is to use an efficient and yet very accurate representation of the uncovered area that uses techniques from discrepancy theory. By representing the uncovered area as a set of points, we can use efficient and simple algorithms for finding small sets of sensors to cover the uncovered areas. We partition the sensor network into cells, and run these algorithms locally. We also present an extensive experimental evaluation to validate our approach.
Lecture Notes in Computer Science, 2021
The original version of the book was inadvertently published with incorrect acknowledgements in c... more The original version of the book was inadvertently published with incorrect acknowledgements in chapters 28 and 31. The acknowledgements have been corrected and read as follows:
International Conference on Machine Learning, Jul 11, 2015
Mobile devices, such as smartphones allow us to use computationally expensive algorithms and tech... more Mobile devices, such as smartphones allow us to use computationally expensive algorithms and techniques. In this paper, we study algorithms in order to solve the problem of finding the most similar trajectory within a number of trajectories. We built a framework that enables the user to compare a trajectory Q with trajectories that have been generated and stored on mobile devices. The system returns to the user the most similar trajectory based on the algorithm that has been selected. The algorithms for the measurement of the trajectory similarity have been implemented for mobile devices running Android OS. We evaluate our algorithms with real geospatial data.
Lecture Notes in Computer Science, 2004
Uploads
Papers by Vana Kalogeraki