Skip to main content

Ivor Tsang

Followers

19

Following

5

Co-authors

5

Public Views

University of Zagreb

Uppsala University

University of East London

Armando Marques-Guedes

UNL - New University of Lisbon

University of Leicester

Gwen Robbins Schug

University of North Carolina at Greensboro

Gabriel Gutierrez-Alonso

University of Salamanca

Macquarie University

Universidade Federal do Rio Grande do Sul

Swansea University

Uploads

Papers by Ivor Tsang

A semi-supervised framework for feature mapping and multiclass classification

We propose a semi-supervised framework incorporat- ing feature mapping with multiclass classiflca... more We propose a semi-supervised framework incorporat- ing feature mapping with multiclass classiflcation. By learning multiple classiflcation tasks simultaneously, this framework can learn the latent feature space efiec- tively for both labeled and unlabeled data. The knowl- edge in the transformed space can be transferred not only between the labeled and unlabeled data, but also across multiple classes, so as to

Location and scatter matching for dataset shift in text mining

Dataset shift from the training data in a source domain to the data in a target domain poses a gr... more Dataset shift from the training data in a source domain to the data in a target domain poses a great challenge for many statistical learning methods. Most algorithms can be viewed as exploiting only the first-order statistics, namely, the empirical mean discrepancy to evaluate the distribution gap. Intuitively, considering only the empirical mean may not be statistically efficient. In this paper, we propose a nonparametric distance metric with a good property which jointly considers the empirical mean (Location) and sample covariance (Scatter) difference. More specifically, we propose an improved symmetric Stein's loss function which combines the mean and covariance discrepancy into a unified Bregman matrix divergence of which Jensen-Shannon divergence between normal distributions is a particular case. Our target is to find a good feature representation which can reduce the distribution gap between different domains, at the same time, ensure that the new derived representation can encode most discriminative components with respect to the label information. We have conducted extensive experiments on several document classification datasets to demonstrate the effectiveness of our proposed method.

Extracting discriminative concepts for domain adaptation in text mining

One common predictive modeling challenge occurs in text mining problems is that the training data... more One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain. Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously. We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method.

Authors' Reply to the "Comments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets

Journal of Machine Learning Research, 2007

Recently, Loosli and Canu (2007) reported that the core vector machine (CVM) (Tsang et al., 2005)... more Recently, Loosli and Canu (2007) reported that the core vector machine (CVM) (Tsang et al., 2005) becomes unstable when C is large. We investigated this problem and found that there are at least two reasons for this: 1. The Linux binary is used in (Loosli and Canu, 2007). Unfortunately, we later found that this version is based on a buggy

DEFEATnet -- A Deep Conventional Image Representation for Image Classification

by Ivor Tsang and Lixin Duan

IEEE Transactions on Circuits and Systems for Video Technology, 2015

ABSTRACT To study underlying possibilities for the successes of conventional image representation... more ABSTRACT To study underlying possibilities for the successes of conventional image representation and deep neural networks in image representation, we propose a DEep FEATure extraction, encoding, and pooling network (DEFEATnet) architecture, which is a marriage between conventional image representation approaches and deep neural networks. Particularly in DEFEATnet, each layer consists of three components: feature extraction, feature encoding, and pooling. The primary advantage of DEFATnet is two-fold: i) It consolidates the prior knowledge (e.g., translation invariance) from extracting, encoding and pooling handcrafted features, as in the conventional feature representation approaches; ii) It represents the object parts at different granularities by gradually increasing the local receptive fields in different layers, as in deep neural networks. Moreover, DEFEATnet is a generalized framework that can readily incorporate all types of local features as well as all kinds of well-designed feature encoding and pooling methods. Since prior knowledge is preserved in DEFEATnet, it is especially useful for image representation on small/medium size datasets where deep neural networks usually fail due to the lack of sufficient training data. Promising experimental results clearly show that DEFEATnets outperform shallow conventional image representation approaches by a large margin when the same type of features, feature encoding and pooling are used. The extensive experiments also demonstrate the effectiveness of the deep architecture of our DEFEATnet in improving the robustness for image presentation.

Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets

by Ivor Tsang and Mingkui Tan

International Conference on Machine Learning, 2010

A sparse representation of Support Vector Machines (SVMs) with respect to input features is desir... more A sparse representation of Support Vector Machines (SVMs) with respect to input features is desirable for many applications. In this paper, by introducing a 0-1 control variable to each input feature, l 0 -norm Sparse SVM (SSVM) is converted to a mixed integer programming (MIP) problem. Rather than directly solving this MIP, we propose an efficient cutting plane algorithm combining with multiple kernel learning to solve its convex relaxation. A global convergence proof for our method is also presented. Comprehensive experimental results on one synthetic and 10 real world datasets show that our proposed method can obtain better or competitive performance compared with existing SVM-based feature selection methods in term of sparsity and generalization performance. Moreover, our proposed method can effectively handle large-scale and extremely high dimensional problems.

Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Jan 12, 2013

This paper targets fine-grained image categorization by learning a category-specific dictionary f... more This paper targets fine-grained image categorization by learning a category-specific dictionary for each category and a shared dictionary for all the categories. Such category-specific dictionaries encode subtle visual differences among different categories, while the shared dictionary encodes common visual patterns among all the categories. To this end, we impose incoherence constraints among the different dictionaries in the objective of feature coding. Moreover, to make the learnt dictionary stable, we also impose the constraint that each dictionary should be self-incoherent. Our proposed dictionary learning formulation not only applies to fine-grained classification, but also improves conventional basic-level object categorization and other tasks such as event recognition. Experimental results on five datasets show that our method can outperform the state-of-the-art finegrained image categorization frameworks as well as sparse coding based dictionary learning frameworks. All the...

Objects Co-segmentation: Propagated From Simpler Images

by Marcus Chen, Santiago Velasco, and Ivor Tsang

Recent works on image co-segmentation aim to segment common objects among image sets. These metho... more Recent works on image co-segmentation aim to segment common objects among image sets. These methods can co-segment simple images well, but their performance may degrade significantly on more cluttered images. In order to co-segment both simple and complex images well, this paper proposes a novel paradigm to rank images and to propagate the segmentation results from the simple images to more and more complex ones. In the experiments, the proposed paradigm demonstrates its effectiveness in segmenting large image sets with a wide variety in object appearance, sizes, orientations, poses, and multiple objects in one image. It outperforms the current state-of-the-art algorithms significantly, especially in difficult images.

A Uniﬁed Feature Selection Framework for Graph Embedding on High Dimensional Data

by Marcus Chen, Ivor Tsang, and Mingkui Tan

A Uniﬁed Feature Selection Framework for Graph Embedding on High Dimensional Data

by Marcus Chen, Ivor Tsang, and Mingkui Tan

IEEE Transactions on Knowledge and Data Engineering, 2014

SVDD-Based Pattern Denoising

Neural Computation, 2007

The support vector data description (SVDD) is one of the best-known one-class support vector lear... more The support vector data description (SVDD) is one of the best-known one-class support vector learning methods, in which one tries the strategy of using balls defined on the feature space in order to distinguish a set of normal data from all other possible abnormal objects. The major concern of this letter is to extend the main idea of SVDD to

A Hybrid PSO-BFGS Strategy for Global Optimization of Multimodal Functions

by Ivor Tsang and Mingkui Tan

IEEE Transactions on Systems, Man, and Cybernetics, 2011

Particle swarm optimizer (PSO) is a powerful optimization algorithm that has been applied to a va... more Particle swarm optimizer (PSO) is a powerful optimization algorithm that has been applied to a variety of problems. It can, however, suffer from premature convergence and slow convergence rate. Motivated by these two problems, a hybrid global optimization strategy combining PSOs with a modified Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is presented in this paper. The modified BFGS method is integrated into the context of the PSOs to improve the particles' local search ability. In addition, in conjunction with the territory technique, a reposition technique to maintain the diversity of particles is proposed to improve the global search ability of PSOs. One advantage of the hybrid strategy is that it can effectively find multiple local solutions or global solutions to the multimodal functions in a boxconstrained space. Based on these local solutions, a reconstruction technique can be adopted to further estimate better solutions. The proposed method is compared with several recently developed optimization algorithms on a set of 20 standard benchmark problems. Experimental results demonstrate that the proposed approach can obtain high-quality solutions on multimodal function optimization problems.

The Pre-Image Problem in Kernel Methods

by Ivor Tsang and James Kwok

International Conference on Machine Learning, 2003

In this paper, we address the problem of nding the pre-image of a feature vector in the feature s... more In this paper, we address the problem of nding the pre-image of a feature vector in the feature space induced by a kernel. This is of central importance in some kernel ap- plications, such as on using kernel principal component analysis (PCA) for image denois- ing. Unlike the traditional method in (Mika et al., 1998) which relies on nonlinear opti-

Tighter and Convex Maximum Margin Clustering

Maximum margin principle has been suc- cessfully applied to many supervised and semi-supervised p... more Maximum margin principle has been suc- cessfully applied to many supervised and semi-supervised problems in machine learn- ing. Recently, this principle was extended for clustering, referred to as Maximum Mar- gin Clustering (MMC) and achieved promis- ing performance in recent studies. To avoid the problem of local minima, MMC can be solved globally via convex semi-deflnite programming (SDP) relaxation. Although

Finding the pre-images in kernel principal component analysis

Scaling up support vector data description by using core-sets

Support vector data description (SVDD) is a powerful kernel method that has been commonly used fo... more Support vector data description (SVDD) is a powerful kernel method that has been commonly used for novelty detection. While its quadratic programming formulation has the important computational advantage of avoiding the problem of local minimum, this has a runtime complexity of O(N 3 ), where N is the number of training patterns. It thus becomes prohibitive when the data set is large. Inspired from the use of core-sets in approximating the minimum enclosing ball problem in computational geometry, we propose in this paper an approximation method that allows SVDD to scale better to larger data sets. Most importantly, the proposed method has a running time that is only linear in N . Experimental results on two large real-world data sets demonstrate that the proposed method can handle data sets that are much larger than those that can be handled by standard SVDD packages, while its approximate solution still attains equally good, or sometimes even better, novelty detection performance.

Learning the kernel in Mahalanobis one-class support vector machines

In this paper, we show that one-class SVMs can also utilize data covariance in a robust manner to... more In this paper, we show that one-class SVMs can also utilize data covariance in a robust manner to improve performance. Furthermore, by constraining the desired kernel function as a convex combination of base kernels, we show that the weighting coefficients can be learned via quadratically constrained quadratic programming (QCQP) or second order cone programming (SOCP) methods. Performance on both toy and real-world data sets show promising results. This paper thus offers another demonstration of the synergy between convex optimization and kernel methods.

Efficient kernel feature extraction for massive data sets

Maximum margin discriminant analysis (MMDA) was proposed that uses the margin idea for feature ex... more Maximum margin discriminant analysis (MMDA) was proposed that uses the margin idea for feature extraction. It often outperforms traditional methods like kernel principal component analysis (KPCA) and kernel Fisher discriminant analysis (KFD). However, as in other kernel methods, its time complexity is cubic in the number of training points m, and is thus computationally inefficient on massive data sets. In this paper, we propose an (1 + ) 2 -approximation algorithm for obtaining the MMDA features by extending the core vector machines. The resultant time complexity is only linear in m, while its space complexity is independent of m. Extensive comparisons with the original MMDA, KPCA, and KFD on a number of large data sets show that the proposed feature extractor can improve classification accuracy, and is also faster than these kernel-based methods by more than an order of magnitude.

Very large SVM training using core vector machines

Standard SVM training has O(m 3 ) time and O(m 2 ) space complexities, where m is the training se... more Standard SVM training has O(m 3 ) time and O(m 2 ) space complexities, where m is the training set size. In this paper, we scale up kernel methods by exploiting the "approximateness" in practical SVM implementations. We formulate many kernel methods as equivalent minimum enclosing ball problems in computational geometry, and then obtain provably approximately optimal solutions efficiently with the use of core-sets. Our proposed Core Vector Machine (CVM) algorithm has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is much faster and can handle much larger data sets than existing scaleup methods. In particular, on our PC with only 512M RAM, the CVM with Gaussian kernel can process the checkerboard data set with 1 million points in less than 13 seconds.

Large-scale sparsified manifold regularization

Semi-supervised learning is more powerful than supervised learning by using both labeled and unla... more Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns.

A semi-supervised framework for feature mapping and multiclass classification

We propose a semi-supervised framework incorporat- ing feature mapping with multiclass classiflca... more We propose a semi-supervised framework incorporat- ing feature mapping with multiclass classiflcation. By learning multiple classiflcation tasks simultaneously, this framework can learn the latent feature space efiec- tively for both labeled and unlabeled data. The knowl- edge in the transformed space can be transferred not only between the labeled and unlabeled data, but also across multiple classes, so as to

Location and scatter matching for dataset shift in text mining

Dataset shift from the training data in a source domain to the data in a target domain poses a gr... more Dataset shift from the training data in a source domain to the data in a target domain poses a great challenge for many statistical learning methods. Most algorithms can be viewed as exploiting only the first-order statistics, namely, the empirical mean discrepancy to evaluate the distribution gap. Intuitively, considering only the empirical mean may not be statistically efficient. In this paper, we propose a nonparametric distance metric with a good property which jointly considers the empirical mean (Location) and sample covariance (Scatter) difference. More specifically, we propose an improved symmetric Stein's loss function which combines the mean and covariance discrepancy into a unified Bregman matrix divergence of which Jensen-Shannon divergence between normal distributions is a particular case. Our target is to find a good feature representation which can reduce the distribution gap between different domains, at the same time, ensure that the new derived representation can encode most discriminative components with respect to the label information. We have conducted extensive experiments on several document classification datasets to demonstrate the effectiveness of our proposed method.

Extracting discriminative concepts for domain adaptation in text mining

One common predictive modeling challenge occurs in text mining problems is that the training data... more One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain. Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously. We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method.

Authors' Reply to the "Comments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets

Journal of Machine Learning Research, 2007

Recently, Loosli and Canu (2007) reported that the core vector machine (CVM) (Tsang et al., 2005)... more Recently, Loosli and Canu (2007) reported that the core vector machine (CVM) (Tsang et al., 2005) becomes unstable when C is large. We investigated this problem and found that there are at least two reasons for this: 1. The Linux binary is used in (Loosli and Canu, 2007). Unfortunately, we later found that this version is based on a buggy

DEFEATnet -- A Deep Conventional Image Representation for Image Classification

by Ivor Tsang and Lixin Duan

IEEE Transactions on Circuits and Systems for Video Technology, 2015

ABSTRACT To study underlying possibilities for the successes of conventional image representation... more ABSTRACT To study underlying possibilities for the successes of conventional image representation and deep neural networks in image representation, we propose a DEep FEATure extraction, encoding, and pooling network (DEFEATnet) architecture, which is a marriage between conventional image representation approaches and deep neural networks. Particularly in DEFEATnet, each layer consists of three components: feature extraction, feature encoding, and pooling. The primary advantage of DEFATnet is two-fold: i) It consolidates the prior knowledge (e.g., translation invariance) from extracting, encoding and pooling handcrafted features, as in the conventional feature representation approaches; ii) It represents the object parts at different granularities by gradually increasing the local receptive fields in different layers, as in deep neural networks. Moreover, DEFEATnet is a generalized framework that can readily incorporate all types of local features as well as all kinds of well-designed feature encoding and pooling methods. Since prior knowledge is preserved in DEFEATnet, it is especially useful for image representation on small/medium size datasets where deep neural networks usually fail due to the lack of sufficient training data. Promising experimental results clearly show that DEFEATnets outperform shallow conventional image representation approaches by a large margin when the same type of features, feature encoding and pooling are used. The extensive experiments also demonstrate the effectiveness of the deep architecture of our DEFEATnet in improving the robustness for image presentation.

Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets

by Ivor Tsang and Mingkui Tan

International Conference on Machine Learning, 2010

A sparse representation of Support Vector Machines (SVMs) with respect to input features is desir... more A sparse representation of Support Vector Machines (SVMs) with respect to input features is desirable for many applications. In this paper, by introducing a 0-1 control variable to each input feature, l 0 -norm Sparse SVM (SSVM) is converted to a mixed integer programming (MIP) problem. Rather than directly solving this MIP, we propose an efficient cutting plane algorithm combining with multiple kernel learning to solve its convex relaxation. A global convergence proof for our method is also presented. Comprehensive experimental results on one synthetic and 10 real world datasets show that our proposed method can obtain better or competitive performance compared with existing SVM-based feature selection methods in term of sparsity and generalization performance. Moreover, our proposed method can effectively handle large-scale and extremely high dimensional problems.

Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Jan 12, 2013

This paper targets fine-grained image categorization by learning a category-specific dictionary f... more This paper targets fine-grained image categorization by learning a category-specific dictionary for each category and a shared dictionary for all the categories. Such category-specific dictionaries encode subtle visual differences among different categories, while the shared dictionary encodes common visual patterns among all the categories. To this end, we impose incoherence constraints among the different dictionaries in the objective of feature coding. Moreover, to make the learnt dictionary stable, we also impose the constraint that each dictionary should be self-incoherent. Our proposed dictionary learning formulation not only applies to fine-grained classification, but also improves conventional basic-level object categorization and other tasks such as event recognition. Experimental results on five datasets show that our method can outperform the state-of-the-art finegrained image categorization frameworks as well as sparse coding based dictionary learning frameworks. All the...

Objects Co-segmentation: Propagated From Simpler Images

by Marcus Chen, Santiago Velasco, and Ivor Tsang

Recent works on image co-segmentation aim to segment common objects among image sets. These metho... more Recent works on image co-segmentation aim to segment common objects among image sets. These methods can co-segment simple images well, but their performance may degrade significantly on more cluttered images. In order to co-segment both simple and complex images well, this paper proposes a novel paradigm to rank images and to propagate the segmentation results from the simple images to more and more complex ones. In the experiments, the proposed paradigm demonstrates its effectiveness in segmenting large image sets with a wide variety in object appearance, sizes, orientations, poses, and multiple objects in one image. It outperforms the current state-of-the-art algorithms significantly, especially in difficult images.

A Uniﬁed Feature Selection Framework for Graph Embedding on High Dimensional Data

by Marcus Chen, Ivor Tsang, and Mingkui Tan

A Uniﬁed Feature Selection Framework for Graph Embedding on High Dimensional Data

by Marcus Chen, Ivor Tsang, and Mingkui Tan

IEEE Transactions on Knowledge and Data Engineering, 2014

SVDD-Based Pattern Denoising

Neural Computation, 2007

The support vector data description (SVDD) is one of the best-known one-class support vector lear... more The support vector data description (SVDD) is one of the best-known one-class support vector learning methods, in which one tries the strategy of using balls defined on the feature space in order to distinguish a set of normal data from all other possible abnormal objects. The major concern of this letter is to extend the main idea of SVDD to

A Hybrid PSO-BFGS Strategy for Global Optimization of Multimodal Functions

by Ivor Tsang and Mingkui Tan

IEEE Transactions on Systems, Man, and Cybernetics, 2011

Particle swarm optimizer (PSO) is a powerful optimization algorithm that has been applied to a va... more Particle swarm optimizer (PSO) is a powerful optimization algorithm that has been applied to a variety of problems. It can, however, suffer from premature convergence and slow convergence rate. Motivated by these two problems, a hybrid global optimization strategy combining PSOs with a modified Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is presented in this paper. The modified BFGS method is integrated into the context of the PSOs to improve the particles' local search ability. In addition, in conjunction with the territory technique, a reposition technique to maintain the diversity of particles is proposed to improve the global search ability of PSOs. One advantage of the hybrid strategy is that it can effectively find multiple local solutions or global solutions to the multimodal functions in a boxconstrained space. Based on these local solutions, a reconstruction technique can be adopted to further estimate better solutions. The proposed method is compared with several recently developed optimization algorithms on a set of 20 standard benchmark problems. Experimental results demonstrate that the proposed approach can obtain high-quality solutions on multimodal function optimization problems.

The Pre-Image Problem in Kernel Methods

by Ivor Tsang and James Kwok

International Conference on Machine Learning, 2003

In this paper, we address the problem of nding the pre-image of a feature vector in the feature s... more In this paper, we address the problem of nding the pre-image of a feature vector in the feature space induced by a kernel. This is of central importance in some kernel ap- plications, such as on using kernel principal component analysis (PCA) for image denois- ing. Unlike the traditional method in (Mika et al., 1998) which relies on nonlinear opti-

Tighter and Convex Maximum Margin Clustering

Maximum margin principle has been suc- cessfully applied to many supervised and semi-supervised p... more Maximum margin principle has been suc- cessfully applied to many supervised and semi-supervised problems in machine learn- ing. Recently, this principle was extended for clustering, referred to as Maximum Mar- gin Clustering (MMC) and achieved promis- ing performance in recent studies. To avoid the problem of local minima, MMC can be solved globally via convex semi-deflnite programming (SDP) relaxation. Although

Finding the pre-images in kernel principal component analysis

Scaling up support vector data description by using core-sets

Support vector data description (SVDD) is a powerful kernel method that has been commonly used fo... more Support vector data description (SVDD) is a powerful kernel method that has been commonly used for novelty detection. While its quadratic programming formulation has the important computational advantage of avoiding the problem of local minimum, this has a runtime complexity of O(N 3 ), where N is the number of training patterns. It thus becomes prohibitive when the data set is large. Inspired from the use of core-sets in approximating the minimum enclosing ball problem in computational geometry, we propose in this paper an approximation method that allows SVDD to scale better to larger data sets. Most importantly, the proposed method has a running time that is only linear in N . Experimental results on two large real-world data sets demonstrate that the proposed method can handle data sets that are much larger than those that can be handled by standard SVDD packages, while its approximate solution still attains equally good, or sometimes even better, novelty detection performance.

Learning the kernel in Mahalanobis one-class support vector machines

In this paper, we show that one-class SVMs can also utilize data covariance in a robust manner to... more In this paper, we show that one-class SVMs can also utilize data covariance in a robust manner to improve performance. Furthermore, by constraining the desired kernel function as a convex combination of base kernels, we show that the weighting coefficients can be learned via quadratically constrained quadratic programming (QCQP) or second order cone programming (SOCP) methods. Performance on both toy and real-world data sets show promising results. This paper thus offers another demonstration of the synergy between convex optimization and kernel methods.

Efficient kernel feature extraction for massive data sets

Maximum margin discriminant analysis (MMDA) was proposed that uses the margin idea for feature ex... more Maximum margin discriminant analysis (MMDA) was proposed that uses the margin idea for feature extraction. It often outperforms traditional methods like kernel principal component analysis (KPCA) and kernel Fisher discriminant analysis (KFD). However, as in other kernel methods, its time complexity is cubic in the number of training points m, and is thus computationally inefficient on massive data sets. In this paper, we propose an (1 + ) 2 -approximation algorithm for obtaining the MMDA features by extending the core vector machines. The resultant time complexity is only linear in m, while its space complexity is independent of m. Extensive comparisons with the original MMDA, KPCA, and KFD on a number of large data sets show that the proposed feature extractor can improve classification accuracy, and is also faster than these kernel-based methods by more than an order of magnitude.

Very large SVM training using core vector machines

Standard SVM training has O(m 3 ) time and O(m 2 ) space complexities, where m is the training se... more Standard SVM training has O(m 3 ) time and O(m 2 ) space complexities, where m is the training set size. In this paper, we scale up kernel methods by exploiting the "approximateness" in practical SVM implementations. We formulate many kernel methods as equivalent minimum enclosing ball problems in computational geometry, and then obtain provably approximately optimal solutions efficiently with the use of core-sets. Our proposed Core Vector Machine (CVM) algorithm has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is much faster and can handle much larger data sets than existing scaleup methods. In particular, on our PC with only 512M RAM, the CVM with Gaussian kernel can process the checkerboard data set with 1 million points in less than 13 seconds.

Large-scale sparsified manifold regularization

Semi-supervised learning is more powerful than supervised learning by using both labeled and unla... more Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns.