Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Journal of Machine Learning Research
In recent years there has been a lot of interest in designing principled classification algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance. In the domain of kernel methods, a principled way to use multiple features is the Multi Kernel Learning (MKL) approach. Here we present a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate. We directly solve the MKL problem in the primal formulation. By having a p-norm formulation of MKL, we introduce a parameter that controls the level of sparsity of the solution, while leading to an easier optimization problem. We prove theoretically and experimentally that 1) our algorithm has a faster convergence rate as the number of kernels grows; 2) the training complexity is linear in the number of training examples; 3) very few iterations are sufficient to reach good solutions. Experiments on standard benchmark databases support our claims.
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
Several object categorization algorithms use kernel methods over multiple cues, as they offer a principled approach to combine multiple cues, and to obtain state-of-theart performance. A general drawback of these strategies is the high computational cost during training, that prevents their application to large-scale problems. They also do not provide theoretical guarantees on their convergence rate.
2007
Abstract An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al.(2006). This approach has opened new perspectives since it makes the MKL approach tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs several iterations before converging towards a reasonable solution. In this paper, we address the MKL problem through an adaptive 2-norm regularization formulation.
Recent studies have shown that multiple kernel learning is very effective for object recognition, leading to the popularity of kernel learning in computer vision problems. In this work, we develop an efficient algorithm for multi-label multiple kernel learning (ML-MKL). We assume that all the classes under consideration share the same combination of kernel functions, and the objective is to find the optimal kernel combination that benefits all the classes. Although several algorithms have been developed for ML-MKL, their computational cost is linear in the number of classes, making them unscalable when the number of classes is large, a challenge frequently encountered in visual object recognition. We address this computational challenge by developing a framework for ML-MKL that combines the worst-case analysis with stochastic approximation. Our analysis shows that the complexity of our algorithm is O(m 1/3 √ lnm), where m is the number of classes. Empirical studies with object recognition show that while achieving similar classification accuracy, the proposed method is significantly more efficient than the state-of-the-art algorithms for ML-MKL.
2011
Many state-of-the-art approaches for Multi Kernel Learning (MKL) struggle at finding a compromise between performance, sparsity of the solution and speed of the optimization process. In this paper we look at the MKL problem at the same time from a learning and optimization point of view. So, instead of designing a regularizer and then struggling to find an efficient method to minimize it, we design the regularizer while keeping the optimization algorithm in mind. Hence, we introduce a novel MKL formulation, which mixes elements of p-norm and elastic-net kind of regularization. We also propose a fast stochastic gradient descent method that solves the novel MKL formulation. We show theoretically and empirically that our method has 1) state-of-the-art performance on many classification tasks; 2) exact sparse solutions with a tunable level of sparsity; 3) a convergence rate bound that depends only logarithmically on the number of kernels used, and is independent of the sparsity required; 4) independence on the particular convex loss function used.
2016
© 2016 K. Nguyen, T. Le, V. Nguyen, T.D. Nguyen & D. Phung. The motivations of multiple kernel learning (MKL) approach are to increase kernel expressiveness capacity and to avoid the expensive grid search over a wide spectrum of kernels. A large amount of work has been proposed to improve the MKL in terms of the computational cost and the sparsity of the solution. However, these studies still either require an expensive grid search on the model parameters or scale unsatisfactorily with the numbers of kernels and training samples. In this paper, we address these issues by conjoining MKL, Stochastic Gradient Descent (SGD) framework, and data augmentation technique. The pathway of our proposed method is developed as follows. We first develop a maximum-aposteriori (MAP) view for MKL under a probabilistic setting and described in a graphical model. This view allows us to develop data augmentation technique to make the inference for finding the optimal parameters feasible, as opposed to t...
2016
Multiple Kernel Learning (MKL) methods are known for their effectiveness in solving classification and regression problems involving multimodal data. Many MKL approaches use linear combination of base kernels, resulting in somewhat limited feature representations. Several non-linear MKL formulations were proposed recently. They provide much higher dimensional feature spaces, and, therefore, richer representations. However, these methods often lead to nonconvex optimization and to intractable number of optimization parameters. In this paper, we propose a new non-linear MKL method that utilizes nuclear norm regularization and leads to convex optimization problem. The proposed Nuclear-norm-Constrained MKL (NuC-MKL) algorithm converges faster, and requires smaller number of calls to an SVM solver, as compared to other competing methods. Moreover, the number of the model support vectors in our approach is usually much smaller, as compared to other methods. This suggests that our algorith...
2010
Recently, it has been proposed to combine multiple kernels using a weighted linear sum. In certain applications, different kernels may be using different input representations and these methods do not consider neither the cost of acquiring them nor the cost of evaluating the kernels. We generalize the framework of Multiple Kernel Learning (Mkl) for this cost-conscious methodology.
We present a probabilistic viewpoint to multiple kernel learning unifying well-known regularised risk approaches and recent advances in approximate Bayesian inference relaxations. The framework proposes a general objective function suitable for regression, robust regression and classification that is lower bound of the marginal likelihood and contains many regularised risk approaches as special cases. Furthermore, we derive an efficient and provably convergent optimisation algorithm.
2012
We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show the surprising result that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows the implementation of sampling from this distribution to run in O(log(d)) time, making the total computational cost of the method to achieve anoptimal solution to be O(log(d)/ 2 ), thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.
Neural Computation, 2013
This review examines kernel methods for online learning, in particular, multiclass classification. We examine margin-based approaches, stemming from Rosenblatt's original perceptron algorithm, as well as nonparametric probabilistic approaches that are based on the popular gaussian process framework. We also examine approaches to online learning that use combinations of kernels-online multiple kernel learning. We present empirical validation of a wide range of methods on a protein fold recognition data set, where different biological feature types are available, and two object recognition data sets, Caltech101 and Caltech256, where multiple feature spaces are available in terms of different image feature extraction methods. Neural Computation 25, 567-625 (2013) c 2013 Massachusetts Institute of Technology
Proceedings of the AAAI Conference on Artificial Intelligence
We propose Veto-Consensus Multiple Kernel Learning (VCMKL), a novel way of combining multiple kernels such that one class of samples is described by the logical intersection (consensus) of base kernelized decision rules, whereas the other classes by the union (veto) of their complements. The proposed configuration is a natural fit for domain description and learning with hidden subgroups. We first provide generalization risk bound in terms of the Rademacher complexity of the classifier, and then a large margin multi-ν learning objective with tunable training error bound is formulated. Seeing that the corresponding optimization is non-convex and existing methods severely suffer from local minima, we establish a new algorithm, namely Parametric Dual Descent Procedure (PDDP) that can approach global optimum with guarantees. The bases of PDDP are two theorems that reveal the global convexity and local explicitness of the parameterized dual optimum, for which a series of new techniques f...
We propose a localized approach to multiple kernel learning that, in contrast to prevalent approaches, can be formulated as a convex optimization problem over a given cluster structure. From which we obtain the first generalization error bounds for localized multiple kernel learning and derive an efficient optimization algorithm based on the Fenchel dual representation. Experiments on real-world datasets from the application domains of computational biology and computer vision show that the convex approach to localized multiple kernel learning can achieve higher prediction accuracies than its global and non-convex local counterparts.
PLoS ONE, 2012
Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unfortunately, so-called 1-norm MKL variants are often observed to be outperformed by an unweighted sum kernel. The contribution of this paper is twofold: We apply a recently developed non-sparse MKL variant to state-of-the-art concept recognition tasks within computer vision. We provide insights on benefits and limits of non-sparse MKL and compare it against its direct competitors, the sum kernel SVM and the sparse MKL. We report empirical results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo Annotation challenge data sets. About to be submitted to PLoS ONE.
2011
Abstract Training structured predictors often requires a considerable time selecting features or tweaking the kernel. Multiple kernel learning (MKL) sidesteps this issue by embedding the kernel learning into the training procedure. Despite the recent progress towards efficiency and scalability of MKL algorithms, the structured output case remains an open research front.
2010 20th International Conference on Pattern Recognition, 2010
In this paper, we propose a novel large-margin based approach for multiple kernel learning (MKL) using biconvex optimization, called Adaptive Multiple Kernel Learning (AdaMKL). To learn the weights for support vectors and the kernel coefficients, AdaMKL minimizes the objective function alternatively by learning one component while fixing the other at a time, and in this way only one convex formulation needs to be solved. We also propose a family of biconvex objective functions with an arbitrary p-norm (p ≥ 1) of kernel coefficients. As our experiments show, AdaMKL performs comparably with the state-of-the-art convex optimization based MKL approaches, but its learning is much simpler and faster.
2009
Combining information from various image descriptors has become a standard technique for image classification tasks. Multiple kernel learning (MKL) approaches allow to determine the optimal combination of such similarity matrices and the optimal classifier simultaneously. Most MKL approaches employ an 1 -regularization on the mixing coefficients to promote sparse solutions; an assumption that is often violated in image applications where descriptors hardly encode orthogonal pieces of information. In this paper, we compare 1 -MKL with a recently developed non-sparse MKL in object classification tasks. We show that the non-sparse MKL outperforms both the standard MKL and SVMs with average kernel mixtures on the PASCAL VOC data sets.
IEEE Transactions on Neural Networks and Learning Systems, 2014
Over the past few years, Multi-Kernel Learning (MKL) has received significant attention among data-driven feature selection techniques in the context of kernel-based learning. MKL formulations have been devised and solved for a broad spectrum of machine learning problems, including Multi-Task Learning (MTL). Solving different MKL formulations usually involves designing algorithms that are tailored to the problem at hand, which is, typically, a non-trivial accomplishment.
2009
In this paper, we present a novel multiple kernel method to learn the optimal classification function for visual concept. Although many carefully designed kernels have been proposed in the literature to measure the visual similarity, few works have been done on how these kernels really affect the learning performance. We propose a Per-Sample Based Multiple Kernel Learning method (PS-MKL) to investigate the discriminative power of each training sample in different basic kernel spaces. The optimal, sample-specific kernel is learned as a linear combination of a set of basic kernels, which leads to a convex optimization problem with a unique global optimum. As illustrated in the experiments on the Caltech 101 and the Wikipedia MM dataset, the proposed PS-MKL outperforms the traditional Multiple Kernel Learning methods (MKL) and achieves comparable results with the state-of-the-art methods of learning visual concepts.
A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the first online bounded algorithm that can learn complex classification tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA 2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy.
BMC Bioinformatics, 2010
Background: This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as L ∞ , L 1 , and L 2 MKL. In particular, L 2 MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing L ∞ MKL method. In real biomedical applications, L 2 MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources. Results: We provide a theoretical analysis of the relationship between the L 2 optimization of kernels in the dual problem with the L 2 coefficient regularization in the primal problem. Understanding the dual L 2 problem grants a unified view on MKL and enables us to extend the L 2 method to a wide range of machine learning problems. We implement L 2 MKL for ranking and classification problems and compare its performance with the sparse L ∞ and the averaging L 1 MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. L 2 MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel L 2 MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.