Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2016
…
16 pages
1 file
© 2016 K. Nguyen, T. Le, V. Nguyen, T.D. Nguyen & D. Phung. The motivations of multiple kernel learning (MKL) approach are to increase kernel expressiveness capacity and to avoid the expensive grid search over a wide spectrum of kernels. A large amount of work has been proposed to improve the MKL in terms of the computational cost and the sparsity of the solution. However, these studies still either require an expensive grid search on the model parameters or scale unsatisfactorily with the numbers of kernels and training samples. In this paper, we address these issues by conjoining MKL, Stochastic Gradient Descent (SGD) framework, and data augmentation technique. The pathway of our proposed method is developed as follows. We first develop a maximum-aposteriori (MAP) view for MKL under a probabilistic setting and described in a graphical model. This view allows us to develop data augmentation technique to make the inference for finding the optimal parameters feasible, as opposed to t...
2007
Abstract An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al.(2006). This approach has opened new perspectives since it makes the MKL approach tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs several iterations before converging towards a reasonable solution. In this paper, we address the MKL problem through an adaptive 2-norm regularization formulation.
Journal of Machine Learning Research, 2012
In recent years there has been a lot of interest in designing principled classification algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance. In the domain of kernel methods, a principled way to use multiple features is the Multi Kernel Learning (MKL) approach. Here we present a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate. We directly solve the MKL problem in the primal formulation. By having a p-norm formulation of MKL, we introduce a parameter that controls the level of sparsity of the solution, while leading to an easier optimization problem. We prove theoretically and experimentally that 1) our algorithm has a faster convergence rate as the number of kernels grows; 2) the training complexity is linear in the number of training examples; 3) very few iterations are sufficient to reach good solutions. Experiments on standard benchmark databases support our claims.
2011
We present a probabilistic viewpoint to multiple kernel learning unifying well-known regularised risk approaches and recent advances in approximate Bayesian inference relaxations. The framework proposes a general objective function suitable for regression, robust regression and classification that is lower bound of the marginal likelihood and contains many regularised risk approaches as special cases. Furthermore, we derive an efficient and provably convergent optimisation algorithm. Keywords: Multiple kernel learning, approximate Bayesian inference, double loop algorithms, Gaussian processes 1. Generalisations to other super-Gaussian potentials (log-concave or not) or models including linear couplings and mixed potentials are given by Nickisch and Seeger (2009).
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2022
The deep neural network suffers from many fundamental issues in machine learning. For example, it often gets trapped into a local minimum in training, and its prediction uncertainty is hard to be assessed. To address these issues, we propose the so-called kernel-expanded stochastic neural network (K-StoNet) model, which incorporates support vector regression (SVR) as the first hidden layer and reformulates the neural network as a latent variable model. The former maps the input vector into an infinite dimensional feature space via a radial basis function (RBF) kernel, ensuring absence of local minima on its training loss surface. The latter breaks the high-dimensional nonconvex neural network training problem into a series of low-dimensional convex optimization problems, and enables its prediction uncertainty easily assessed. The K-StoNet can be easily trained using the imputation-regularized optimization (IRO) algorithm. Compared to traditional deep neural networks, K-StoNet possesses a theoretical guarantee to asymptotically converge to the global optimum and enables the prediction uncertainty easily assessed. The performances of the new model in training, prediction and uncertainty quantification are illustrated by simulated and real data examples.
Recent studies have shown that multiple kernel learning is very effective for object recognition, leading to the popularity of kernel learning in computer vision problems. In this work, we develop an efficient algorithm for multi-label multiple kernel learning (ML-MKL). We assume that all the classes under consideration share the same combination of kernel functions, and the objective is to find the optimal kernel combination that benefits all the classes. Although several algorithms have been developed for ML-MKL, their computational cost is linear in the number of classes, making them unscalable when the number of classes is large, a challenge frequently encountered in visual object recognition. We address this computational challenge by developing a framework for ML-MKL that combines the worst-case analysis with stochastic approximation. Our analysis shows that the complexity of our algorithm is O(m 1/3 √ lnm), where m is the number of classes. Empirical studies with object recognition show that while achieving similar classification accuracy, the proposed method is significantly more efficient than the state-of-the-art algorithms for ML-MKL.
2010
Recently, it has been proposed to combine multiple kernels using a weighted linear sum. In certain applications, different kernels may be using different input representations and these methods do not consider neither the cost of acquiring them nor the cost of evaluating the kernels. We generalize the framework of Multiple Kernel Learning (Mkl) for this cost-conscious methodology.
2010 20th International Conference on Pattern Recognition, 2010
In this paper, we propose a novel large-margin based approach for multiple kernel learning (MKL) using biconvex optimization, called Adaptive Multiple Kernel Learning (AdaMKL). To learn the weights for support vectors and the kernel coefficients, AdaMKL minimizes the objective function alternatively by learning one component while fixing the other at a time, and in this way only one convex formulation needs to be solved. We also propose a family of biconvex objective functions with an arbitrary p-norm (p ≥ 1) of kernel coefficients. As our experiments show, AdaMKL performs comparably with the state-of-the-art convex optimization based MKL approaches, but its learning is much simpler and faster.
2011
Abstract In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms.
2008
Abstract Recently, instead of selecting a single kernel, multiple kernel learning (MKL) has been proposed which uses a convex combination of kernels, where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. In this paper, we develop a localized multiple kernel learning (LMKL) algorithm using a gating model for selecting the appropriate kernel function locally.
international journal for research in applied science and engineering technology ijraset, 2020
Special clustering method is used in our project to make use of spectral graph structure for partition of affinty matrix. Affinity matrix is a weighted adjacency matrix of the data .project explain the effective approach to reduce or minimize the local and global noises. We have used MULTIPLE KERNAL LEARNING (MKL) to extract local and global noises. To solve the optimization method block coordinate descent algorithm is used in this project. It is Unsupervised Robust multiple kernal lerning approach Unsupervised approach is done manually so it will decrease the chance of inaccuracy. It does not work on the preset condition. work accordingly given condition. It sets the fuzziness by MKL algorithm. In this paper we will analysis local and global noises and charcterized them accordingly. As we are proposing it manually then result will be valid as compared to supervised approach. We are using numerical values, function(f) ,graph to extract global and local noises from matrix. After extracting local and global noises we can remove corrupted data. We have used total 14 datasets to evaluate the effectiveness of our method. Simultaneously, we learned a consensus kernel by minimizing the disagreement over cleaned kernels. Keywords-Unsupervised robust multiple learning, Affinity matrix, Block coordinate descent algorithm. I. INTRODUCTION Kernel-based clustering algorithms, which include kernel k-means, have the potential to capture the non-linear inherent shape in lots of real global information sets and thereby commonly attain higher clustering performance than linear partition methods. In real global packages, we are able to construct numerous kernels with the aid of applying extraordinary kernel capabilities. However, the overall performance of clustering quite depends on the choice of kernels. Unfortunately, it's far nevertheless a undertaking to decide appropriate one in all an in depth range of viable kernels for the given information and task in advance. It is more difficult especially in the unsupervised learning tasks such as clustering, because of the absence of labels. To conquer this difficulty, many unsupervised multiple kernel mastering strategies have been proposed, which aim to learn a consensus kernel from a person-defined pool of kernels. Conventional unsupervised a couple of kernel studying strategies learn a consensus kernel by way of linearly combining a hard and fast of candidate kernels. For instance, furnished a multi-view spectral clustering approach which linearly combined spectral embedding to get the final clustering. It proposed a localized a couple of kernel k means technique for cancer biology applications. Liu et. proposed a multiple kernel gaining knowledge of approach by using mastering the optimal neighborhood for clustering. The above techniques concentrate on combining all candidate kernels, whereas ignore the robustness of the strategies. However, in real world packages, the kernels are frequently infected by using noises. For instance, since the unique information may incorporate noises and outliers, the kernel constructed with these information will also be infected. In addition, despite the fact that the statistics is clean, incorrect kernel features might also introduce noises. To alleviate the effort of noises, currently, some strong a couple of kernel getting to know techniques are proposed. These techniques recognition on the noises resulting from the corrupted instances, while cannot capture the noises caused through kernel features nicely. Note that, a few more than one kernel studying methods assign large weight on the extra suitable kernels can seize the noise brought on by means of kernel functions to some extent. However, in these strategies, the burden is imposed on all elements of the kernel matrix, and it is a bit too rough .For example, some inappropriate kernels May have a totally low weight in those methods, which means all elements which includes the useful parts inside the kernel matrix will share the equal low weight and could no longer be useful to the kernel getting to know. To take care of noises greater comprehensively, in this paper, we suggest a singular Local and Global De-noising Multiple Kernel Learning technique. We examine that the kernel matrix might also comprise styles of noises: one is because of infected instances, and the other is due to beside the point kernel functions. Figures indicates the unique facts set and affords an awesome Gaussian kernel, and the blue shade method the kernel fee is small. For the first kind of noise, once an example is contaminated by means of noises, both the corresponding row and column of the kernel matrix could be also contaminated.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEEE transactions on pattern analysis and machine intelligence, 2014
Neural Networks, 2014
BMC Bioinformatics, 2010
Machine Learning and …, 2009
IEEE Transactions on Neural Networks and Learning Systems, 2014