Papers by Katya Vladislavleva

arXiv (Cornell University), Sep 9, 2011
Wind energy plays an increasing role in the supply of energy worldwide. The energy output of a wi... more Wind energy plays an increasing role in the supply of energy worldwide. The energy output of a wind farm is highly dependent on the weather condition present at the wind farm. If the output can be predicted more accurately, energy suppliers can coordinate the collaborative production of different energy sources more efficiently to avoid costly overproductions. With this paper, we take a computer science perspective on energy prediction based on weather data and analyze the important parameters as well as their correlation on the energy output. To deal with the interaction of the different parameters we use symbolic regression based on the genetic programming tool DataModeler. Our studies are carried out on publicly available weather and energy data for a wind farm in Australia. We reveal the correlation of the different variables for the energy output. The model obtained for energy prediction gives a very reliable prediction of the energy output for newly given weather data.
Development of adaptive online soft sensors using symbolic regression

Session details: Workshop: symbolic regression and modelling
Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, 2014
It is our great pleasure to welcome you to the 6th 2014 GECCO Workshop on Symbolic Regression and... more It is our great pleasure to welcome you to the 6th 2014 GECCO Workshop on Symbolic Regression and Modeling. Over the past five workshops, we have had interesting presentations and fantastic discussions around symbolic regression, genetic programming, and the increasing demands and opportunities to impact science and industry. The workshop has spawned new research and lead to many new collaborations. We are looking forward to another great workshop with four accepted papers and one invited talk. Symbolic Regression and Modeling is used to designate the search for symbolic descriptions, usually in the language of mathematics, to describe and predict numerical data in diverse fields such as industry, economics, finance and science. Symbolic modeling captures the field of symbolic regression: a genetic programming based search technique for finding symbolic formulae on numerical data in order to obtain an accurate and concise description of that data in symbolic, mathematical form. In the evolutionary computation field it also captures learning classifier systems, if and when they are applied to obtain specific interpretable results in the field of interest. The key discriminator of producing symbolic results over numerical results is the ability to interpret and analyze the results, leading either to acceptance by field experts, or to heightened understanding of the theory in the field of application. Interpretation is key, and the workshop will focus heavily on this. The workshop will focus on advances in using symbolic modeling for real world problems in industry, economics, finance and science. The invited talk to be presented at the workshop is by David Medernach from the University on the topic of training data sampling approaches in symbolic regression modeling.
Age-Fitness Pareto Optimization 1
Boston/Dordrecht/London iv
Food Research International, 2020
Prime-Time: Symbolic Regression Takes Its Place in the Real World
Genetic and Evolutionary Computation, 2016
In this chapter we review a number of real-world applications where symbolic regression was used ... more In this chapter we review a number of real-world applications where symbolic regression was used recently and with great success. Industrial scale symbolic regression armed with the power to select right variables and variable combinations, build robust trustable predictions and guide experimentation has undoubtedly earned its place in industrial process optimization, business forecasting, product design and now complex systems modeling and policy making.

Computational Intelligence in Industrial Applications
Springer Handbook of Computational Intelligence, 2015
In this chapter, we review the progress and the impact of computational intelligence for industri... more In this chapter, we review the progress and the impact of computational intelligence for industrial applications sampled from the last 10 Open image in new window years of our personal careers and areas of research (all authors of this chapter do computational modeling for a living). This chapter is structured as follows. Section 57.2 introduces a classification of data-driven predictive analytics problems into three groups based on the goals and the information content of the data. Section 57.3 briefly covers most frequently used methods for predictive modeling and compares them in the context of available a priori knowledge and required execution time. Section 57.4 focuses on the importance of good workflows for successful predictive analytics projects. Section 57.5 provides several examples of such workflows. Section 57.6 concludes the chapter.
Optimizing a Cloud Contract Portfolio Using Genetic Programming-Based Load Models
Genetic and Evolutionary Computation, 2014
Scalable Symbolic Regression by Continuous Evolution with Very Small Populations
Genetic and Evolutionary Computation, 2010
Pursuing the Pareto Paradigm: Tournaments, Algorithm Variations and Ordinal Optimization
Genetic and Evolutionary Computation
Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models
Genetic and Evolutionary Computation Series
Better Solutions Faster: Soft Evolution of Robust Regression Models InParetogeneticprogramming
Genetic and Evolutionary Computation Series
... A thorough analysis of the properties of this measure is given in a recent paper of Maarten K... more ... A thorough analysis of the properties of this measure is given in a recent paper of Maarten Keijzer and James Foster, who called it a visitation length, (Keijzer and Foster, 2007). ... ExperimentMaarten Kotanchek Tower BUDGET µ IQR µ IQR µ IQR per run ...
Exploiting Trustable Models via Pareto GP for Targeted Data Collection
Genetic and Evolutionary Computation
Page 1. Chapter 10 EXPLOITING TRUSTABLE MODELS VIA PARETO GP FOR TARGETED DATA COLLECTION Mark Ko... more Page 1. Chapter 10 EXPLOITING TRUSTABLE MODELS VIA PARETO GP FOR TARGETED DATA COLLECTION Mark Kotanchek1, Guido Smits2 and Ekaterina Vladislavleva3 1 Evolved Analytics LLC, Midland, MI, USA; 2 ...

Genetic and Evolutionary Computation, 2009
In this chapter we illustrate a framework based on symbolic regression to generate and sharpen th... more In this chapter we illustrate a framework based on symbolic regression to generate and sharpen the questions about the nature of the underlying system and provide additional context and understanding based on multi-variate numeric data. We emphasize the necessity to perform data modeling in a global approach, iteratively applying data analysis and adaptation, model building, and problem reduction procedures. We illustrate it for the problem of detecting outliers and extracting significant features from the CountryData 1 -a data set of economic, political, social and geographic data collected. We present two complementary ways of extracting outliers from the data -the content-based and the model-based approach. The content-based approach studies the geometrical structure of the multi-variate data, and uses data-balancing algorithms to sort the data records in the order of decreasing typicalness, and identify the outliers as the least typical records before the modeling is applied to a data set. The model-based outlier detection approach uses symbolic regression via Pareto genetic programming (GP) to identify records which are systematically under-or over-predicted by diverse ensembles of (thousands of) global non-linear symbolic regression models. Both approaches applied to the CountryData produce insights into outlier vs. prototypes division among world countries and about driving economic properties predicting gross domestic product (GDP) per capita.

Symbolic Regression Is Not Enough: It Takes a Village to Raise a Model
Genetic and Evolutionary Computation, 2013
From a real-world perspective, good enough has been achieved in the core representations and evol... more From a real-world perspective, good enough has been achieved in the core representations and evolutionary strategies of genetic programming assuming state-of-the-art algorithms and implementations are being used. What is needed for industrial symbolic regression are tools to (a) explore and refine the data, (b) explore the developed model space and extract insight and guidance from the available sample of the infinite possibilities of model forms and (c) identify appropriate models for deployment as predictors, emulators, etc. This chapter focuses on the approaches used in DataModeler to address the modeling life cycle. A special focus in this chapter is the identification of driving variables and metavariables. Exploiting the diversity of search paths followed during independent evolutions and, then, looking at the distributions of variables and metavariable usage also provides an opportunity to gather key insights. The goal in this framework, however, is not to replace the modeler but, rather, to augment the inclusion of context and collection of insight by removing mechanistic requirements and facilitating the ability to think. We believe that the net result is higher quality and more robust models.

Genetic Programming Theory and Practice X
Genetic and Evolutionary Computation, 2013
These contributions, written by the foremost international researchers and practitioners of Genet... more These contributions, written by the foremost international researchers and practitioners of Genetic Programming (GP), explore the synergy between theoretical and empirical results on real-world problems, producing a comprehensive view of the state of the art in GP. Topics in this volume include: evolutionary constraints, relaxation of selection mechanisms, diversity preservation strategies, flexing fitness evaluation, evolution in dynamic environments, multi-objective and multi-modal selection, foundations of evolvability, evolvable and adaptive evolutionary operators, foundation of injecting expert knowledge in evolutionary search, analysis of problem difficulty and required GP algorithm complexity, foundations in running GP on the cloud communication, cooperation, flexible implementation, and ensemble methods. Additional focal points for GP symbolic regression are: (1) The need to guarantee convergence to solutions in the function discovery mode; (2) Issues on model validation; (3) The need for model analysis workflows for insight generation based on generated GP solutions model exploration, visualization, variable selection, dimensionality analysis; (4) Issues in combining different types of data. Readers will discover large-scale, real-world applications of GP to a variety of problem domains via in-depth presentations of the latest and most significant results.
Genetic and Evolutionary Computation, 2011
, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection w... more , except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Ordinal Pareto Genetic Programming
2006 IEEE International Conference on Evolutionary Computation
This paper introduces the first attempt to combine the theory of ordinal optimization and symboli... more This paper introduces the first attempt to combine the theory of ordinal optimization and symbolic regression via genetic programming. A new approach called ordinal ParetoGP allows obtaining considerably fitter solutions with more consistency between independent runs while spending less computational effort. The conclusions are supported by a number of experiments using three symbolic regression benchmark problems of various size.
Active Learning to Understand Infectious Disease Models and Improve Policy Making
PLoS Computational Biology, 2014

IEEE Transactions on Evolutionary Computation, 2010
Symbolic regression of input-output data conventionally treats data records equally. We suggest a... more Symbolic regression of input-output data conventionally treats data records equally. We suggest a framework for automatic assignment of weights to data samples, which takes into account the sample's relative importance. In this paper, we study the possibilities of improving symbolic regression on real-life data by incorporating weights into the fitness function. We introduce four weighting schemes defining the importance of a point relative to proximity, surrounding, remoteness, and nonlinear deviation from k nearest-in-the-input-space neighbors. For enhanced analysis and modeling of large imbalanced data sets we introduce a simple multidimensional iterative technique for subsampling. This technique allows a sensible partitioning (and compression) of data to nested subsets of an arbitrary size in such a way that the subsets are balanced with respect to either of the presented weighting schemes. For cases where a given input-output data set contains some redundancy, we suggest an approach to considerably improve the effectiveness of regression by applying more modeling effort to a smaller subset of the data set that has a similar information content. Such improvement is achieved due to better exploration of the search space of potential solutions at the same number of function evaluations. We compare different approaches to regression on five benchmark problems with a fixed budget allocation. We demonstrate that the significant improvement in the quality of the regression models can be obtained either with the weighted regression, exploratory regression using a compressed subset with a similar information content, or exploratory weighted regression on the compressed subset, which is weighted with one of the proposed weighting schemes.
Uploads
Papers by Katya Vladislavleva