Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences
In a large number of applications, engineers have to estimate values of an unknown function given some observed samples. This task is referred to as function approximation or as generalization. One way to solve the problem is to regress a family of parameterized functions so as to make it fit at best the observed samples. Yet, usually batch methods are used and parameterization is habitually linear. Moreover, very few approaches try to quantify uncertainty reduction occurring when acquiring more samples (thus more information), which can be quite useful depending on the application. In this paper we propose a sparse nonlinear bayesian online kernel regression. Sparsity is achieved in a preprocessing step by using a dictionary method. The nonlinear bayesian kernel regression can therefore be considered as achieved online by a Sigma Point Kalman Filter. First experiments on a cardinal sine regression show that our approach is promising.
2008
In a large number of applications, engineers have to estimate a function linked to the state of a dynamic system. To do so, a sequence of samples drawn from this unknown function is observed while the system is transiting from state to state and the problem is to generalize these observations to unvisited states. Several solutions can be envisioned among which regressing a family of parameterized functions so as to make it fit at best to the observed samples. However classical methods cannot handle the case where actual samples are not directly observable but only a nonlinear mapping of them is available, which happen when a special sensor has to be used or when solving the Bellman equation in order to control the system. This paper introduces a method based on Bayesian filtering and kernel machines designed to solve the tricky problem at sight. First experimental results are promising.
IEEE Transactions on Signal Processing, 2009
Kernel-based algorithms have been a topic of considerable interest in the machine learning community over the last ten years. Their attractiveness resides in their elegant treatment of nonlinear problems. They have been successfully applied to pattern recognition, regression and density estimation. A common characteristic of kernel-based methods is that they deal with kernel expansions whose number of terms equals the number of input data, making them unsuitable for online applications. Recently, several solutions have been proposed to circumvent this computational burden in time series prediction problems. Nevertheless, most of them require excessively elaborate and costly operations. In this paper, we investigate a new model reduction criterion that makes computationally demanding sparsification procedures unnecessary. The increase in the number of variables is controlled by the coherence parameter, a fundamental quantity that characterizes the behavior of dictionaries in sparse approximation problems. We incorporate the coherence criterion into a new kernel-based affine projection algorithm for time series prediction. We also derive the kernel-based normalized LMS algorithm as a particular case. Finally, experiments are conducted to compare our approach to existing methods.
The kernel least mean squares (KLMS) algorithm is a computationally efficient nonlinear adaptive filtering method that "kernelizes" the celebrated (linear) least mean squares algorithm. We demonstrate that the least mean squares algorithm is closely related to the Kalman filtering, and thus, the KLMS can be interpreted as an approximate Bayesian filtering method. This allows us to systematically develop extensions of the KLMS by modifying the underlying state-space and observation models. The resulting extensions introduce many desirable properties such as "forgetting", and the ability to learn from discrete data, while retaining the computational simplicity and time complexity of the original algorithm.
2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), 2016
We study the relationship between online Gaussian process (GP) regression and kernel least mean squares (KLMS) algorithms. While the latter have no capacity of storing the entire posterior distribution during online learning, we discover that their operation corresponds to the assumption of a fixed posterior covariance that follows a simple parametric model. Interestingly, several well-known KLMS algorithms correspond to specific cases of this model. The probabilistic perspective allows us to understand how each of them handles uncertainty, which could explain some of their performance differences.
2007
Abstract We present a Bayesian formulation of locally weighted learning (LWL) using the novel concept of a randomly varying coefficient model. Based on this, we propose a mechanism for multivariate non-linear regression using spatially localised linear models that learns completely independent of each other, uses only local information and adapts the local model complexity in a data driven fashion. We derive online updates for the model parameters based on variational Bayesian EM.
IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 2009
Sparse kernel methods are very efficient in solving regression and classification problems. The sparsity and performance of these methods depend on selecting an appropriate kernel function, which is typically achieved using a cross-validation procedure. In this paper, we propose an incremental method for supervised learning, which is similar to the relevance vector machine (RVM) but also learns the parameters of the kernels during model training. Specifically, we learn different parameter values for each kernel, resulting in a very flexible model. In order to avoid overfitting, we use a sparsity enforcing prior that controls the effective number of parameters of the model. We present experimental results on artificial data to demonstrate the advantages of the proposed method and we provide a comparison with the typical RVM on several commonly used regression and classification data sets.
Neural Networks, 2010
Kernelized LASSO (Least Absolute Selection and Shrinkage Operator) has been investigated in two separate recent papers and . This paper is concerned with learning kernels under the LASSO formulation via adopting a generative Bayesian learning and inference approach. A new robust learning algorithm is proposed which produces a sparse kernel model with the capability of learning regularized parameters and kernel hyperparameters. A comparison with state-of-the-art methods for constructing sparse regression models such as the relevance vector machine (RVM) and the local regularization assisted orthogonal least squares regression (LROLS) is given. The new algorithm is also demonstrated to possess considerable computational advantages.
2011
Abstract We present a new Gaussian process (GP) inference algorithm, called online sparse matrix Gaussian processes (OSMGP), and demonstrate its merits by applying it to the problems of head pose estimation and visual tracking. The OSMGP is based upon the observation that for kernels with local support, the Gram matrix is typically sparse. Maintaining and updating the sparse Cholesky factor of the Gram matrix can be done efficiently using Givens rotations.
SIAM Conference on Computational Science and Engineering (CSE23), Amsterdam, February 26 - March 3, 2023, 2023
Optimal Uncertainty Quantification is a powerful mathematical tool which can be used to rigorously bound the probability of exceeding a given performance threshold for uncertain operational conditions or system characteristics. This mathematical framework can be very computationally demanding. The use of a metamodel is highly desirable. Moreover, the robustness of the bounding obtained in this framework will strongly depend on the quality of this metamodel. Therefore, one needs to build a metamodel which is quick to evaluate and as accurate as possible. An algorithm, called Spectral Kernel Ridge Regression, is introduced to design kernels from available data in Gaussian process regression surrogate modeling techniques. This algorithm is illustrated on a aerodynamic numerical example.
Journal of Multivariate Analysis, 2012
Statistical modelling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik's ǫ-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation.
IEEE Transactions on Signal and Information Processing over Networks
This paper proposes an approach for online training of a sparse multi-output Gaussian process (GP) model using sequentially obtained data. The considered model combines linearly multiple latent sparse GPs to produce correlated output variables. Each latent GP has its own set of inducing points to achieve sparsity. We show that given the model hyperparameters, the posterior over the inducing points is Gaussian under Gaussian noise since they are linearly related to the model outputs. However, the inducing points from different latent GPs would become correlated, leading to a full covariance matrix cumbersome to handle. Variational inference is thus applied and an approximate regression technique is obtained, with which the posteriors over different inducing point sets can always factorize. As the model outputs are non-linearly dependent on the hyperparameters, a novel marginalized particle filer (MPF)-based algorithm is proposed for the online inference of the inducing point values and hyperparameters. The approximate regression technique is incorporated in the MPF and its distributed realization is presented. Algorithm validation using synthetic and real data is conducted, and promising results are obtained.
We consider the regression problem, i.e. prediction of a real valued function. A Gaussian process prior is imposed on the function, and is combined with the training data to obtain predictions for new points. We introduce a Bayesian regularization on parameters of a covariance function of the process, which increases quality of approximation and robustness of the estimation. Also an approach to modeling nonstationary cova riance function of a Gaussian process on basis of linear expansion in parametric functional dictionary is pro posed. Introducing such a covariance function allows to model functions, which have nonhomogeneous behaviour. Combining above features with careful optimization of covariance function parameters results in unified approach, which can be easily implemented and applied. The resulting algorithm is an out of the box solution to regression problems, with no need to tune parameters manually. The effectiveness of the method is demonstrated on various datasets.
2011
It is a well-known problem that obtaining a correct bandwidth and/or smoothing parameter in nonparametric regression is difficult in the presence of correlated errors. There exist a wide variety of methods coping with this problem, but they all critically depend on a tuning procedure which requires accurate information about the correlation structure. We propose a bandwidth selection procedure based on bimodal kernels which successfully removes the correlation without requiring any prior knowledge about its structure and its parameters. Further, we show that the form of the kernel is very important when errors are correlated which is in contrast to the independent and identically distributed (i.i.d.) case. Finally, some extensions are proposed to use the proposed criterion in support vector machines and least squares support vector machines for regression.
2010
Abstract In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters.
Uncertainty in Artificial Intelligence, 2004
The paper describes an application of Aggregating Algorithm to the problem of regression. It generalizes earlier results concerned with plain linear regression to kernel techniques and presents an on-line algorithm which performs nearly as well as any oblivious kernel predictor. The paper contains the derivation of an estimate on the performance of this algorithm. The estimate is then used to derive an application of the Complexity Approximation Principle to kernel methods.
Proceedings of the AAAI Conference on Artificial Intelligence, 2020
This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest model evidence, our VBKS algorithm considers the kernel as a random variable and learns its belief from data such that the uncertainty of the kernel can be interpreted and exploited to avoid overconfident GP predictions. To achieve this, we represent the probabilistic kernel as an additional variational variable in a variational inference (VI) framework for SGPR models where its posterior belief is learned together with that of the other variational variables (i.e., inducing variables and kernel hyperparameters). In particular, we transform the discrete kernel belief into a continuous parametric distribution via reparameterization in order to apply VI. Though it is computationally challenging to jointly optimize a large number of hyperparameters due...
… of the 23rd international conference on …
The recent Predictive Linear Gaussian model (or PLG) improves upon traditional linear dynamical system models by using a predictive representation of state, which makes consistent parameter estimation possible without any loss of modeling power and while using fewer parameters. In this paper we extend the PLG to model stochastic, nonlinear dynamical systems by using kernel methods. With a Gaussian kernel, the model admits closed form solutions to the state update equations due to conjugacy between the dynamics and the state representation. We also explore an efficient sigma-point approximation to the state updates, and show how all of the model parameters can be learned directly from data (and can be learned on-line with the Kernel Recursive Least-Squares algorithm). We empirically compare the model and its approximation to the original PLG and discuss their relative advantages.
Computational Geosciences, 2011
One of the major limitations of the classical ensemble Kalman filter (EnKF) is the assumption of a linear relationship between the state vector and the observed data. Thus, the classical EnKF algorithm can suffer from poor performance when considering highly non-linear and non-Gaussian likelihood models. In this paper, we have formulated the EnKF based on kernel-shrinkage regression techniques. This approach makes it possible to handle highly non-linear likelihood models efficiently. Moreover, a solution to the preimage problem, essential in previously suggested EnKF schemes based on kernel methods, is not required. Testing the suggested procedure on a simple, illustrative problem with a non-linear likelihood model, we were able to obtain good results when the classical EnKF failed.
2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017
Despite their attractiveness, popular perception is that techniques for nonparametric function approximation do not scale to streaming data due to an intractable growth in the amount of storage they require. To solve this problem in a memory-affordable way, we propose an online technique based on functional stochastic gradient descent in tandem with supervised sparsification based on greedy function subspace projections. The method, called parsimonious online learning with kernels (POLK), provides a controllable tradeoff between its solution accuracy and the amount of memory it requires. We derive conditions under which the generated function sequence converges almost surely to the optimal function, and we establish that the memory requirement remains finite. We evaluate POLK for kernel multi-class logistic regression and kernel hinge-loss classification on three canonical data sets: a synthetic Gaussian mixture model, the MNIST handwritten digits, and the Brodatz texture database. On all three tasks, we observe a favorable trade-off of objective function evaluation, classification performance, and complexity of the nonparametric regressor extracted the proposed method.
Automatica, 2020
Gaussian Processes (GPs) are powerful kernelized methods for non-parameteric regression used in many applications. However, their use is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. In this work we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman Filter like updating for online learning. The majority of previous work has focused on the batch setting, in particular for learning the model parameters and the position of the inducing points, here instead we focus on training with mini-batches. By exploiting the Kalman filter formulation, we propose a novel approach that estimates such parameters by recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state of the art methods, our method keeps analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential Gaussian Process regression on synthetic GP as well as real-world data with up to a million of data samples.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.