Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021
We propose a novel Neyman-Pearson (NP) classification algorithm, which achieves the maximum detection rate and meanwhile keeps the false alarm rate around a user-specified threshold. The proposed method processes data in an online framework with nonlinear modeling capabilities by transforming the observations into a high dimensional space via the random Fourier features. After this transformation, we use a linear classifier whose parameters are sequentially learned. We emphasize that our algorithm is the first online Neyman-Pearson classifier in the literature, which is suitable for both linearly and nonlinearly separable datasets. In our experiments, we investigate the performance of our algorithm on well-known datasets and observe that the proposed online algorithm successfully learns the nonlinear class separations (by outperforming the linear models) while matching the desired false alarm rate.
IEEE Access
We propose a novel Neyman-Pearson (NP) classifier that is both online and nonlinear as the first time in the literature. The proposed classifier operates on a binary labeled data stream in an online manner, and maximizes the detection power about a user-specified and controllable false positive rate. Our NP classifier is a single hidden layer feedforward neural network (SLFN), which is initialized with random Fourier features (RFFs) to construct the kernel space of the radial basis function at its hidden layer with sinusoidal activation. Not only does this use of RFFs provide an excellent initialization with great nonlinear modeling capability, but it also exponentially reduces the parameter complexity and compactifies the network to mitigate overfitting while improving the processing efficiency substantially. We sequentially learn the SLFN with stochastic gradient descent updates based on a Lagrangian NP objective. As a result, we obtain an expedited online adaptation and powerful nonlinear Neyman-Pearson modeling. Our algorithm is appropriate for large scale data applications and provides a decent false positive rate controllability with real time processing since it only has O(N) computational and O(1) space complexity (N : number of data instances). In our extensive set of experiments on several real datasets, our algorithm is highly superior over the competing state-of-the-art techniques, either by outperforming in terms of the NP classification objective with a comparable computational as well as space complexity or by achieving a comparable performance with significantly lower complexity.
Neural Computation, 2013
This review examines kernel methods for online learning, in particular, multiclass classification. We examine margin-based approaches, stemming from Rosenblatt's original perceptron algorithm, as well as nonparametric probabilistic approaches that are based on the popular gaussian process framework. We also examine approaches to online learning that use combinations of kernels-online multiple kernel learning. We present empirical validation of a wide range of methods on a protein fold recognition data set, where different biological feature types are available, and two object recognition data sets, Caltech101 and Caltech256, where multiple feature spaces are available in terms of different image feature extraction methods. Neural Computation 25, 567-625 (2013) c 2013 Massachusetts Institute of Technology
We apply kernel-based machine learning methods to online learning situations, and look at the related requirement of reducing the complexity of the learnt classifier. Online methods are particularly useful in situations which involve streaming data, such as medical or financial applications. We show that the concept of span of support vectors can be used to build a classifier that performs reasonably well while satisfying given space and time constraints, thus making it potentially suitable for such online situations.
2005
Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We first present an online SVM algorithm based on this premise. LASVM yields competitive misclassification rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the first online bounded algorithm that can learn complex classification tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA 2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy.
Abstract — nowadays we are faced with an infinite data sets, such as bank card transactions, which according to its specific approach, the traditional classification methods cannot be used for them. In this data, the classification model must be created with a limited number of data and then with the receiving every new data, first, it has been classified and ultimately according to the actual label (which obtained with a delay) improve classification model. This problem known the online classification data.one of the effective ways to solve this problem, the methods are based on support vector machines that can pointed to OISVM, ROSVM, LASVM and… .in this classification accuracy and speed and memory is very important; on the other hand, since finishing operations support vector machines only depends to support vector which is nearly to optimal hyperplane clastic; all other samples are irrelevant about this operation of the decision or optimal hyperplane, in which case it is possible classification accuracy be low. In this paper to achieve the desirable and accuracy and speed memory, we want by reflect the distribution density samples and linearly independent vectors, improve the support vector machines. Performance of the proposed method on the 10 dataset from UCI database and KEELS evaluation. Keywords: support vector machines, linear independent vector, relative density degree, online learning International Journal of Computer Science and Information Security (IJCSIS), Vol. 13, No. 10, October 2015 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
Pattern Recognition, 2006
Nonlinear discriminant analysis may be transformed into the form of kernel-based discriminant analysis. Thus, the corresponding discriminant direction can be solved by linear equations. From the view of feature space, the nonlinear discriminant analysis is still a linear method, and it is provable that in feature space the method is equivalent to Fisher discriminant analysis. We consider that one linear combination of parts of training samples, called "significant nodes", can replace the total training samples to express the corresponding discriminant vector in feature space to some extent. In this paper, an efficient algorithm is proposed to determine "significant nodes" one by one. The principle of determining "significant nodes" is simple and reasonable, and the consequent algorithm can be carried out with acceptable computation cost. Depending on the kernel functions between test samples and all "significant nodes", classification can be implemented. The proposed method is called fast kernel-based nonlinear method (FKNM). It is noticeable that the number of "significant nodes" may be much smaller than that of the total training samples. As a result, for two-class classification problems, the FKNM will be much more efficient than the naive kernel-based nonlinear method (NKNM). The FKNM can be also applied to multi-class via two approaches: one-against-the-rest and one-against-one. Although there is a view that one-against-one is superior to one-against-the-rest in classification efficiency, it seems that for the FKNM one-against-the-rest is more efficient than one-against-one. Experiments on benchmark and real datasets illustrate that, for two-class and multi-class classifications, the FKNM is effective, feasible and much efficient.
In this paper, we present an extension to the optimal Kassociated Network classifier to perform online classification. The static classifier uses a special network, stated as optimal network, to classify a test pattern. This network is constructed through a iterative process which is based in the K-associated network and in a measure called purity. The good results with the static classifier obtained in stationary data sets has motivated the development of an incremental version. Knowing the network capability of representing similarity relationships among pattern and data classes, here we present an extension that implements incremental learning to handle online classification for non-stationary data sets. Results in non-stationary data comparing the proposed method and two state-of-the-art ensemble classification methods are provided.
Studies in Applied Mathematics, 2010
Gaussians are important tools for learning from data of large dimensions. The variance of a Gaussian kernel is a measurement of the frequency range of function components or features retrieved by learning algorithms induced by the Gaussian. The learning ability and approximation power increase when the variance of the Gaussian decreases. Thus, it is natural to use Gaussians with decreasing variances for online algorithms when samples are imposed one by one. In this paper, we consider fully online classification algorithms associated with a general loss function and varying Gaussians which are closely related to regularization schemes in reproducing kernel Hilbert spaces. Learning rates are derived in terms of the smoothness of a target function associated with the probability measure controlling sampling and the loss function. A critical estimate is given for the norm of the difference of regularized target functions as the variance of the Gaussian changes. Concrete learning rates are presented for the online learning algorithm with the least square loss function.
2009
The batch nature limits the standard kernel principal component analysis (KPCA) methods in numerous applications, especially for dynamic or large-scale data. In this paper, an efficient adaptive approach is presented for online extraction of the kernel principal components (KPC). The contribution of this paper may be divided into two parts. First, kernel covariance matrix is correctly updated to adapt to the changing characteristics of data. Second, KPC are recursively formulated to overcome the batch nature of standard KPCA.This formulation is derived from the recursive eigen-decomposition of kernel covariance matrix and indicates the KPC variation caused by the new data. The proposed method not only alleviates sub-optimality of the KPCA method for non-stationary data, but also maintains constant update speed and memory usage as the data-size increases. Experiments for simulation data and real applications demonstrate that our approach yields improvements in terms of both computati...
Proceedings of the AAAI Conference on Artificial Intelligence
In Kernel-based Learning the targeted phenomenon is summarized by a set of explanatory examples derived from the training set. When the model size grows with the complexity of the task, such approaches are so computationally demanding that the adoption of comprehensive models is not always viable.In this paper, a general framework aimed at minimizing this problem is proposed: multiple classifiers are stratified and dynamically invoked according to increasing levels of complexity corresponding to incrementally more expressive representation spaces.Computationally expensive inferences are thus adopted only when the classification at lower levels is too uncertain over an individual instance. The application of complex functions is thus avoided where possible, with a significant reduction of the overall costs. The proposed strategy has been integrated within two well-known algorithms: Support Vector Machines and Passive-Aggressive Online classifier.A significant cost reduction (up to 90...
IEEE Transactions on Neural Networks, 2001
The eigenstructure of the second-order statistics of a multivariate random population can be inferred from the matrix of pairwise combinations of inner products of the samples. Therefore, it can be also efficiently obtained in the implicit, high-dimensional feature spaces defined by kernel functions. We elaborate on this property to obtain general expressions for immediate derivation of nonlinear counterparts of a number of standard pattern analysis algorithms, including principal component analysis, data compression and denoising, and Fisher's discriminant. The connection between kernel methods and nonparametric density estimation is also illustrated. Using these results we introduce the kernel version of Mahalanobis distance, which originates nonparametric models with unexpected and interesting properties, and also propose a kernel version of the minimum squared error (MSE) linear discriminant function. This learning machine is particularly simple and includes a number of generalized linear models such as the potential functions method or the radial basis function (RBF) network. Our results shed some light on the relative merit of feature spaces and inductive bias in the remarkable generalization properties of the support vector machine (SVM). Although in most situations the SVM obtains the lowest error rates, exhaustive experiments with synthetic and natural data show that simple kernel machines based on pseudoinversion are competitive in problems with appreciable class overlapping
IEEE Transactions on Information Theory, 2004
In this paper, it is shown how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on an independent and identically distributed (i.i.d.) sample of data. Using a simple large deviation argument, we prove tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic associated with the on-line performance of the ensemble. Via sharp pointwise bounds on , we then obtain risk tail bounds for kernel Perceptron algorithms in terms of the spectrum of the empirical kernel matrix. These bounds reveal that the linear hypotheses found via our approach achieve optimal tradeoffs between hinge loss and margin size over the class of all linear functions, an issue that was left open by previous results.
We consider pattern classification using a weighted sum of normalized kernel functions. Such schemes can be viewed as estimates of class a posteriori probabilities. We apply this regression method successfully to two real life pattern recognition problems.
2005
Typically two procedures are used in optimising classifiers. The first is cost-sensitive optimisation, in which given priors and costs, the optimal classifier weights/thresholds are specified corresponding to minimum loss, followed by model comparison. This procedure extends naturally to the multiclass case. The second is Neyman-Pearson optimisation, in which costs may not be certain, and the problem involves specification of one of the class errors, which subsequently fixes the corresponding error (in the two class case), followed by comparisons between different models. This optimisation is well understood in the two-class case, but less so in the multiclass case. In this paper we study the extension of Neyman-Pearson optimisation to the multiclass case, involving specifying various classification errors, and minimising the others. It is shown empirically that the optimisation can indeed be useful for the multiclass case, but obtaining a viable solution is only guaranteed if a single error is specified. Specifying more than one error may result in a solution depending on the data and classifier, which is determined via a multiclass ROC analysis framework.
IEEE transactions on pattern …, 2010
This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2ν-SVM. We then exploit a characterization of the 2ν-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.
Journal of Signal Processing Systems, 2011
For most practical supervised learning applications, the training datasets are often linearly nonseparable based on the traditional Euclidean metric. To strive for more effective classification capability, a new and flexible distance metric has to be adopted. There exist a great variety of kernel-based classifiers, each with their own favorable domain of applications. They are all based on a new distance metric induced from a kernel-based inner-product. It is also known that classifier's effectiveness depends strongly on the distribution of training and testing data. The problem lies in that we just do not know in advance the right models for the observation data and measurement noise. As a result, it is impossible to pinpoint an appropriate model for the best tradeoff between the classifier's training accuracy and error resilience. The objective of this paper is to develop a versatile classifier endowed with a broad array of parameters to cope with various kinds of real-world data. More specifically, a so-called PDA-SVM Hybrid is proposed as a unified model for kernel
International Conference on Acoustics, Speech, and Signal Processing, 2011
This paper presents an online feature selection and classification algorithm. The algorithm is implemented for impact acoustics signals to sort hazelnut kernels. The classifier, which is used to determine the most discriminative features, is updated when a new observation is processed. The algorithm starts with decomposing the signal both in time and frequency axes in binary tree format. A feature
Acoustics, Speech and …, 2006
We study the problem of designing support vector classifiers with respect to a Neyman-Pearson criterion. Specifically, given a user-specified level α ∈ (0, 1), how can we ensure a false alarm rate no greater than α while minimizing the miss rate? We examine two approaches, one based on shifting the offset of a conventionally trained SVM and the other based on the introduction of classspecific weights. Our contributions include a novel heuristic for improved error estimation and a strategy for efficiently searching the parameter space of the second method. We also provide a characterization of the feasible parameter set of the 2ν-SVM on which the second approach is based. The proposed methods are compared on four benchmark datasets.
2010
We propose a general framework to online learning for classification problems with time-varying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, allowing to recover and improve the bounds of known online classification algorithms. By optimizing the general bound we derive a new online classification algorithm, called NAROW, that hybridly uses adaptive-and fixed-second order information. We analyze the properties of the algorithm and illustrate its performance using synthetic dataset.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.