Papers by Georgios Chasparis

Research Square (Research Square), Feb 12, 2024
Traditional controllers have limitations as they rely on prior knowledge about the physics of the... more Traditional controllers have limitations as they rely on prior knowledge about the physics of the problem, require modeling of dynamics, and struggle to adapt to abnormal situations. Deep reinforcement learning has the potential to address these problems by learning optimal control policies through exploration in an environment. For safety-critical environments, it is impractical to explore randomly, and replacing conventional controllers with black-box models is also undesirable. Also, it is expensive in continuous state and action spaces, unless the search space is constrained. To address these challenges we propose a specialized deep residual policy safe reinforcement learning with a cycle of learning approach adapted for complex and continuous state-action spaces. Residual policy learning allows learning a hybrid control architecture where the reinforcement learning agent acts in synchronous collaboration with the conventional controller. The cycle of learning initiates the policy through the expert trajectory and guides the exploration around it. Further, the specialization through the input-output hidden Markov model helps to optimize policy that lies within the region of interest (such as abnormality), where the reinforcement learning agent is required and is activated. The proposed solution is validated on the Tennessee Eastman process control.
Zenodo (CERN European Organization for Nuclear Research), Mar 1, 2018
This paper introduces a novel payoff-based learning scheme for distributed optimization in repeat... more This paper introduces a novel payoff-based learning scheme for distributed optimization in repeatedly-played strategic-form games. Standard reinforcement-based learning schemes exhibit several limitations with respect to their asymptotic stability. For example, in two-player coordination games, payoff-dominant (or efficient) Nash equilibria may not be stochastically stable. In this work, we present an extension of perturbed learning automata, namely aspiration-based perturbed learning automata (APLA) that overcomes these limitations. We provide a stochastic stability analysis of APLA in multi-player coordination games. We further show that payoffdominant Nash equilibria are the only stochastically stable states.

Energy and Buildings, Oct 1, 2021
Abstract The need for accurate balancing in electricity markets and a larger integration of renew... more Abstract The need for accurate balancing in electricity markets and a larger integration of renewable sources of electricity require accurate forecasts of electricity loads in residential buildings. In this paper, we consider the problem of short-term (one-day ahead) forecasting of the electricity-load consumption in residential buildings. In order to generate such forecasts, historical electricity consumption data are used, presented in the form of a time series with a fixed time step. Initially, we review standard forecasting methodologies including naive persistence models, auto-regressive-based models (e.g., AR and SARIMA), and the triple exponential smoothing Holt-Winters (HW) model. We then introduce three forecasting models, namely i) the Persistence-based Auto-regressive (PAR) model, ii) the Seasonal Persistence-based Regressive (SPR) model, and iii) the Seasonal Persistence-based Neural Network (SPNN) model. Given that the accuracy of a forecasting model may vary during the year, and the fact that models may differ with respect to their training times, we also investigate different variations of ensemble models (i.e., mixtures of the previously considered models) and adaptive model switching strategies. Finally, we demonstrate through simulations the forecasting accuracy of all considered forecasting models validated on real-world data generated from four residential buildings. Through an extensive series of evaluation tests, it is shown that the proposed SPR forecasting model can attain approximately a 7% forecast error reduction over standard techniques (e.g., SARIMA and HW). Furthermore, when models have not been sufficiently trained, ensemble models based on a weighted average forecaster can provide approximately a further 4% forecast error reduction.

arXiv (Cornell University), Aug 10, 2016
Standard (black-box) regression models may not necessarily suffice for accurate identification an... more Standard (black-box) regression models may not necessarily suffice for accurate identification and prediction of thermal dynamics in buildings. This is particularly apparent when either the flow rate or the inlet temperature of the thermal medium varies significantly with time. To this end, this paper analytically derives, using physical insight, and investigates linear regression models with nonlinear regressors for system identification and prediction of thermal dynamics in buildings. Comparison is performed with standard linear regression models with respect to both a) identification error, and b) prediction performance within a model-predictivecontrol implementation for climate control in a residential building. The implementation is performed through the En-ergyPlus building simulator and demonstrates that a careful consideration of the nonlinear effects may provide significant benefits with respect to the power consumption.

arXiv (Cornell University), Nov 7, 2016
Online output prediction is an indispensable part of any model predictive control implementation,... more Online output prediction is an indispensable part of any model predictive control implementation, especially when simplifications of the underlying physical model have been considered and/or the operating conditions change quite often. Furthermore, the selection of an output prediction model is strongly related to the data available, while designing/altering the data collection process may not be an option. Thus, in several scenarios, selecting the most appropriate prediction model needs to be performed during runtime. To this end, this paper introduces a supervisory output prediction scheme, tailored specifically for input-output stable bilinear systems, that intends on automating the process of selecting the most appropriate prediction model during runtime. The selection process is based upon a reinforcement-learning scheme, where prediction models are selected according to their prior prediction performance. An additional selection process is concerned with appropriately partitioning the control-inputs' domain in order to also allow for switched-system approximations of the original bilinear dynamics. We show analytically that the proposed scheme converges (in probability) to the best model and partition. We finally demonstrate these properties through simulations of temperature prediction in residential buildings.
Procedia Computer Science, 2022

We consider the problem of distributed convergence to efficient outcomes in coordination games th... more We consider the problem of distributed convergence to efficient outcomes in coordination games through dynamics based on aspiration learning. Under aspiration learning, a player continues to play an action as long as the rewards received exceed a specified aspiration level. Here, the aspiration level is a fading memory average of past rewards, and these levels also are subject to occasional random perturbations. A player becomes dissatisfied whenever a received reward is less than the aspiration level, in which case the player experiments with a probability proportional to the degree of dissatisfaction. Our first contribution is the characterization of the asymptotic behavior of the induced Markov chain of the iterated process in terms of an equivalent finite-state Markov chain. We then characterize explicitly the behavior of the proposed aspiration learning in a generalized version of coordination games, examples of which include network formation and common-pool games. In particular, we show that in generic coordination games the frequency at which an efficient action profile is played can be made arbitrarily large. Although convergence to efficient outcomes is desirable, in several coordination games, such as common-pool games, attainability of fair outcomes, i.e., sequences of plays at which players experience highly rewarding returns with the same frequency, might also be of special interest. To this end, we demonstrate through analysis and simulations that aspiration learning also establishes fair outcomes in all symmetric coordination games, including common-pool games.

arXiv (Cornell University), Mar 1, 2018
Motivated by the need for adaptive, secure and responsive scheduling in a great range of computin... more Motivated by the need for adaptive, secure and responsive scheduling in a great range of computing applications, including human-centered and time-critical applications, this paper proposes a scheduling framework that seamlessly adds resource-awareness to any parallel application. In particular, we introduce a learning-based framework for dynamic placement of parallel threads to Non-Uniform Memory Access (NUMA) architectures. Decisions are taken independently by each thread in a decentralized fashion that significantly reduces computational complexity. The advantage of the proposed learning scheme is the ability to easily incorporate any multi-objective criterion and easily adapt to performance variations during runtime. Under the multi-objective criterion of maximizing total completed instructions per second (i.e., both computational and memory-access instructions), we provide analytical guarantees with respect to the expected performance of the parallel application. We also compare the performance of the proposed scheme with the Linux operating system scheduler in an extensive set of applications, including both computationally and memory intensive ones. We have observed that performance improvement could be significant especially under limited availability of resources and under irregular memory-access patterns.

arXiv (Cornell University), Oct 13, 2016
This paper presents an online transfer learning framework for improving temperature predictions i... more This paper presents an online transfer learning framework for improving temperature predictions in residential buildings. In transfer learning, prediction models trained under a set of available data from a target domain (e.g., house with limited data) can be improved through the use of data generated from similar source domains (e.g., houses with rich data). Given also the need for prediction models that can be trained online (e.g., as part of a model-predictive-control implementation), this paper introduces the generalized online transfer learning algorithm (GOTL). It employs a weighted combination of the available predictors (i.e., the target and source predictors) and guarantees convergence to the best weighted predictor. Furthermore, the use of Transfer Component Analysis (TCA) allows for using more than a single source domains, since it may facilitate the fit of a single model on more than one source domains (houses). This allows GOTL to transfer knowledge from more than one source domains. We further validate our results through experiments in climate control for residential buildings and show that GOTL may lead to non-negligible energy savings for given comfort levels.
Procedia Computer Science, 2023

The problem of efficient resource allocation has drawn significant attention in many scientific d... more The problem of efficient resource allocation has drawn significant attention in many scientific disciplines due to its direct societal benefits, such as energy savings. Traditional approaches in addressing online resource allocation neglect the potential benefit of feedback information available from the running tasks/loads as well as the potential flexibility of a task to adjust its operation level in order to increase efficiency. The present paper builds upon recent developments in the area of bandwidth allocation in computing systems and proposes a design methodology for addressing a large class of online resource allocation problems with flexible tasks. The proposed methodology is based upon a measurement- or utility-based learning scheme, namely reinforcement learning. We demonstrate through analysis the potential of the proposed scheme in asymptotically providing efficient resource allocation when only measurements of the performances of the tasks are available.

arXiv (Cornell University), Feb 27, 2017
This paper considers a class of reinforcement-learning that belongs to the family of Learning Aut... more This paper considers a class of reinforcement-learning that belongs to the family of Learning Automata and provides a stochasticstability analysis in strategic-form games. For this class of dynamics, convergence to pure Nash equilibria has been demonstrated only for the fine class of potential games. Prior work primarily provides convergence properties of the dynamics through stochastic approximations, where the asymptotic behavior can be associated with the limit points of an ordinary-differential equation (ODE). However, analyzing global convergence through the ODE-approximation requires the existence of a Lyapunov or a potential function, which naturally restricts the applicabity of these algorithms to a fine class of games. To overcome these limitations, this paper introduces an alternative framework for analyzing stochasticstability that is based upon an explicit characterization of the (unique) invariant probability measure of the induced Markov chain.

Frontiers in chemical engineering, Jun 16, 2021
The amount of sensors in process industry is continuously increasing as they are getting faster, ... more The amount of sensors in process industry is continuously increasing as they are getting faster, better and cheaper. Due to the rising amount of available data, the processing of generated data has to be automatized in a computationally efficient manner. Such a solution should also be easily implementable and reproducible independently of the details of the application domain. This paper provides a suitable and versatile usable infrastructure that deals with Big Data in the process industry on various platforms using efficient, fast and modern technologies for data gathering, processing, storing and visualization. Contrary to prior work, we provide an easy-to-use, easily reproducible, adaptable and configurable Big Data management solution with a detailed implementation description that does not require expert or domain-specific knowledge. In addition to the infrastructure implementation, we focus on monitoring both infrastructure inputs and outputs, including incoming data of processes and model predictions and performances, thus allowing for early interventions and actions if problems occur.

arXiv (Cornell University), Sep 18, 2017
This paper considers a class of discrete-time reinforcement-learning dynamics and provides a stoc... more This paper considers a class of discrete-time reinforcement-learning dynamics and provides a stochastic-stability analysis in repeatedly played positive-utility (strategic-form) games. For this class of dynamics, convergence to pure Nash equilibria has been demonstrated only for the fine class of potential games. Prior work primarily provides convergence properties through stochastic approximations, where the asymptotic behavior can be associated with the limit points of an ordinary-differential equation (ODE). However, analyzing global convergence through an ODE-approximation requires the existence of a Lyapunov or a potential function, which naturally restricts the analysis to a fine class of games. To overcome these limitations, this paper introduces an alternative framework for analyzing convergence under reinforcement learning that is based upon an explicit characterization of the invariant probability measure of the induced Markov chain. We further provide a methodology for computing the invariant probability measure in positive-utility games, together with an illustration in the context of coordination games.
arXiv (Cornell University), Mar 7, 2018
This paper introduces a novel payoff-based learning scheme for distributed optimization in repeat... more This paper introduces a novel payoff-based learning scheme for distributed optimization in repeatedly-played strategic-form games. Standard reinforcement-based learning schemes exhibit several limitations with respect to their asymptotic stability. For example, in two-player coordination games, payoff-dominant (or efficient) Nash equilibria may not be stochastically stable. In this work, we present an extension of perturbed learning automata, namely aspiration-based perturbed learning automata (APLA) that overcomes these limitations. We provide a stochastic stability analysis of APLA in multi-player coordination games. We further show that payoffdominant Nash equilibria are the only stochastically stable states.

2019 18th European Control Conference (ECC), 2019
We consider the Austrian model for the liberalized electricity market which is based on the Balan... more We consider the Austrian model for the liberalized electricity market which is based on the Balance-Group (BG) organization. According to this model, all participants (consumers and producers) are organized into (virtual) balance groups, within which injection and withdrawal of power are balanced. In this paper, the available energy potential within a BG corresponds to the energy that can additionally be exchanged (generated/consumed) through directly controlling the operation of the participants' battery-storage systems. Under such scheme, a participant's battery is directly controlled in exchange to some compensation. We present an optimization framework that allows a BG to optimally utilize the participants' batteries either for exchanging the available energy potential in the spot-market (Day-Ahead or Intra-Day) or for reacting to predicted energy imbalances.

IFAC-PapersOnLine, 2020
Abstract We address the problem of trading energy flexibility, derived from pools of residential ... more Abstract We address the problem of trading energy flexibility, derived from pools of residential Photovoltaic and battery-storage systems, to the Day-ahead electricity market. By flexibility, we imply any additional energy that can be stored to or withdrawn from the participating batteries/households at a given time during the next day. The optimization variables include the selection/activation of a subset of participating batteries and the amount of flexibility that should be extracted. Furthermore, the optimization objective corresponds to the expected forecast revenues that can be generated by trading this flexibility to the Day-ahead electricity market. Given the high computationally complexity of a full scale optimization in the case of a large number of participating batteries, we propose a reinforcement-learning-based methodology, which admits linear complexity with the number of participating batteries. The proposed methodology advances prior work with respect to the integration of a large number of batteries. Furthermore, it extends prior work of the authors with respect to providing analytical performance guarantees in comparison with the baseline/nominal operation of the battery. Finally, we compare through simulations the performance of the proposed method with a Linear-Programming-based optimization that provides the exact optimum.
Procedia Computer Science, 2022

Lecture Notes in Computer Science, 2017
This paper introduces a resource allocation framework specifically tailored for addressing the pr... more This paper introduces a resource allocation framework specifically tailored for addressing the problem of dynamic placement (or pinning) of parallelized applications to processing units. Decisions are updated recursively for each thread by a resource manager/scheduler which runs in parallel to the application's threads and periodically records their performances and assigns to them new CPU affinities. For updating the CPU-affinities, the scheduler uses a reinforcement-learning algorithm, each branch of which is responsible for assigning a new placement strategy to each thread. The proposed resource allocation framework is flexible enough to address alternative optimization criteria, such as maximum average processing speed and minimum speed variance among threads. We demonstrate the response of the dynamic scheduler under fixed and varying availability of resources (e.g., when other applications running on the same platform) in a parallel implementation of the Ant-Colony Optimization.
Uploads
Papers by Georgios Chasparis