Kernel learning at the first level of inference

Cawley, Gavin C.; Talbot, Nicola L.C.

Kernel learning at the first level of inference

Gavin Cawley

2014, Neural Networks

visibility

…

description

30 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense.

Surette Bierman

The problem of variable selection in binary kernel classification is addressed in this thesis. Kernel methods are fairly recent additions to the statistical toolbox, having originated approximately two decades ago in machine learning and artificial intelligence. These methods are growing in popularity and are already frequently applied in regression and classification problems. A special thank you also to my dad, Klopper Oosthuizen, for many investments in me, and for his love and support, and to my family and friends. VIII CONTENTS CHAPTER 1: INTRODUCTION 1.1 NOTATION 1.2 OVERVIEW OF THE THESIS CHAPTER 2: VARIABLE SELECTION FOR KERNEL METHODS 2.1 INTRODUCTION 2.2 AN OVERVIEW OF KERNEL METHODS 2.2.1 BASIC CONCEPTS 2.2.2 KERNEL FUNCTIONS AND THE KERNEL TRICK 2.2.3 CONSTRUCTING A KERNEL CLASSIFIER 2.2.4 A REGULARISATION PERSPECTIVE 2.3 VARIABLE SELECTION IN BINARY CLASSIFICATION: IMPORTANT ASPECTS 2.3.1 THE RELEVANCE OF VARIABLES 2.3.2 SELECTION STRATEGIES AND CRITERIA 2.4 VARIABLE SELECTION FOR KERNEL METHODS 2.4.1 THE NEED FOR VARIABLE SELECTION 2.4.2 COMPLICATING FACTORS AND POSSIBLE APPROACHES 2.5 SUMMARY CHAPTER 3: KERNEL VARIABLE SELECTION IN INPUT SPACE 1 K 3.4 MONTE CARLO SIMULATION STUDY IX 3.4.1 EXPERIMENTAL DESIGN 3.4.2 STEPS IN EACH SIMULATION REPETITION 3.4.3 GENERATING THE TRAINING AND TEST DATA 3.4.4 HYPERPARAMETER SPECIFICATION 3.4.5 THE VARIABLE SELECTION PROCEDURES 3.4.6 RESULTS AND CONCLUSIONS 3.5 SUMMARY CHAPTER 4: ALGORITHM-INDEPENDENT AND ALGORITHM-DEPENDENT SELECTION IN FEATURE SPACE 4.1 INTRODUCTION 4.2 SUPPORT VECTOR MACHINES 4.2.1 THE TRAINING DATA ARE LINEARLY SEPARABLE IN INPUT SPACE 4.2.2 THE TRAINING DATA ARE LINEARLY SEPARABLE IN FEATURE SPACE 4.2.3 HANDLING NOISY DATA 4.3 KERNEL FISHER DISCRIMINANT ANALYSIS 4.3.1 LINEAR DISCRIMINANT ANALYSIS 4.3.2 THE KERNEL FISHER DISCRIMINANT FUNCTION

Log In

Kernel learning at the first level of inference

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics