Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In

or

Email

Password

Remember me on this computer

or reset password

Enter the email address you signed up with and we'll email you a reset link.

Need an account? Click here to sign up

Log In
Sign Up

• Implementation • Experiments & results

2012

visibility

…

description

22 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Learning with Lq<1 vs L1-norm regularisation with exponentially many

Related papers

Learning with L q<1 vs L 1-norm regularisation with exponentially Many Irrelevant Features

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008

We study the use of fractional norms for regularisation in supervised learning from high dimensional data, in conditions of a large number of irrelevant features, focusing on logistic regression. We develop a variational method for parameter estimation, and show an equivalence between two approximations recently proposed in the statistics literature. Building on previous work by A.Ng, we show the fractional norm regularised logistic regression enjoys a sample complexity that grows logarithmically with the data dimensions and polynomially with the number of relevant dimensions. In addition, extensive empirical testing indicates that fractional-norm regularisation is more suitable than L1 in cases when the number of relevant features is very small, and works very well despite a large number of irrelevant features. 1 L q<1-Regularised Logistic Regression Consider a training set of pairs z = {(x j , y j)} n j=1 drawn i.i.d. from some unknown distribution P. x j ∈ R m are m-dimensional input points and y j ∈ {−1, 1} are the associated target labels for these points. Given z, the aim in supervised learning is to learn a mapping from inputs to targets that is then able to predict the target values for previously unseen points that follow the same distribution as the training data. We are interested in problems with large number m of input features, of which only a few r << m are relevant to the target. In particular, we focus on a form of regularised logistic regression for this purpose: max w n j=1 log p(y j |x j , w) (1) subject to||w|| q ≤ A (2) or, in the Lagrangian formulation: max w n j=1 log p(y j |x j , w) − α||w|| q q (3)

View PDFchevron_right

Fast Implementation of ℓ1 Regularized Learning Algorithms Using Gradient Descent Methods

With the advent of high-throughput technologies, ℓ 1 regularized learning algorithms have attracted much at-tention recently. Dozens of algorithms have been pro-posed for fast implementation, using various advanced optimization techniques. In this paper, we demon-strate that ℓ 1 regularized learning problems can be eas-ily solved by using gradient-descent techniques. The ba-sic idea is to transform a convex optimization problem with a non-differentiable objective function into an un-constrained non-convex problem, upon which, via gra-dient descent, reaching a globally optimum solution is guaranteed. We present detailed implementation of the algorithm using ℓ 1 regularized logistic regression as a particular application. We conduct large-scale experi-ments to compare the new approach with other state-of-the-art algorithms on eight medium and large-scale problems. We demonstrate that our algorithm, though simple, performs similarly or even better than other ad-vanced algorithms in ter...

View PDFchevron_right

Regularization techniques for learning with matrices

2009

There is growing body of learning problems for which it is natural to organize the parameters into matrix, so as to appropriately regularize the parameters under some matrix norm (in order to impose some more sophisticated prior knowledge). This work describes and analyzes a systematic method for constructing such matrix-based, regularization methods. In particular, we focus on how the underlying statistical properties of a given problem can help us decide which regularization function is appropriate. Our methodology is based on the known duality fact: that a function is strongly convex with respect to some norm if and only if its conjugate function is strongly smooth with respect to the dual norm. This result has already been found to be a key component in deriving and analyzing several learning algorithms. We demonstrate the potential of this framework by deriving novel generalization and regret bounds for multi-task learning, multi-class learning, and kernel learning.

View PDFchevron_right

Adaptation for Regularization Operators in Learning Theory

2006

We consider learning algorithms induced by regularization methods in the regression setting. We show that previously obtained error bounds for these algorithms using a-priori choices of the regularization parameter, can be attained using a suitable a-posteriori choice based on validation. In particular, these results prove adaptation of the rate of convergence of the estimators to the minimax rate induced by the "effective dimension" of the problem. We also show universal consistency for this class methods.

View PDFchevron_right

Supplementary Materials for “A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification”

The last equality then leads to (I). Next, we discuss the implementation of evaluating Lj (0) as it is the main operation at each inner iteration. As we mention in Section 6. 2, GLMNET explicitly normalizes xi by (6.2). Here we use Lj (0; X) to denote Lj (0) on the scaled data. If we define yi={1 if yi= 1, 0 if yi=− 1, then Lj (0; X) can be computed by

View PDFchevron_right

Iterative Regularization for Learning with Convex Loss Functions

We consider the problem of supervised learning with convex loss functions and propose a new form of iterative regularization based on the subgradient method. Unlike other regularization approaches, in iterative regularization no constraint or penalization is considered, and generalization is achieved by (early) stopping an empirical iteration. We consider a nonparametric setting, in the framework of reproducing kernel Hilbert spaces, and prove finite sample bounds on the excess risk under general regularity conditions. Our study provides a new class of efficient regularized learning algorithms and gives insights on the interplay between statistics and optimization in machine learning.

View PDFchevron_right

Special Issue: Regularization Techniques for Machine Learning and Their Applications

Theodore Kotsilieris

Electronics, 2022

Over the last decade, learning theory performed significant progress in the development of sophisticated algorithms and their theoretical foundations. The theory builds on concepts that exploit ideas and methodologies from mathematical areas such as optimization theory. Regularization is probably the key to address the challenging problem of overfitting, which usually occurs in high-dimensional learning. Its primary goal is to make the machine learning algorithm “learn” and not “memorize” by penalizing the algorithm to reduce its generalization error in order to avoid the risk of overfitting. As a result, the variance of the model is significantly reduced, without substantial increase in its bias and without losing any important properties in the data.

View PDFchevron_right

An Efficient Projection for l1, ∞ Regularization

Ariadna Quattoni

2009

In recent years the l1, ∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l1, ∞ regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l1, ∞ ball. We present an algorithm that works in O(nlog n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that l1,∞ leads to better performance than both l2 and l1 regularization and that it is is effective in discovering jointly sparse solutions. 1.

View PDFchevron_right

Performance of first- and second-order methods for $$\ell _1$$ ℓ 1 -regularized least squares problems

Computational Optimization and Applications, 2016

We study the performance of first-and second-order optimization methods for 1-regularized sparse least-squares problems as the conditioning of the problem changes and the dimensions of the problem increase up to one trillion. A rigorously defined generator is presented which allows control of the dimensions, the conditioning and the sparsity of the problem. The generator has very low memory requirements and scales well with the dimensions of the problem. Keywords 1-Regularised least-squares • First-order methods • Second-order methods • Sparse least squares instance generator • Ill-conditioned problems B Kimon Fountoulakis

View PDFchevron_right

A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression

Journal of Machine Learning Research, 2010

ℓ 1 -regularized logistic regression, also known as sparse logistic regression, is widely used in machine learning, computer vision, data mining, bioinformatics and neural signal processing. The use of ℓ 1 regularization attributes attractive properties to the classifier, such as feature selection, robustness to noise, and as a result, classifier generality in the context of supervised learning. When a sparse logistic regression problem has large-scale data in high dimensions, it is computationally expensive to minimize the non-differentiable ℓ 1 -norm in the objective function. Motivated by recent work , we propose a novel hybrid algorithm based on combining two types of optimization iterations: one being very fast and memory friendly while the other being slower but more accurate. Called hybrid iterative shrinkage (HIS), the resulting algorithm is comprised of a fixed point continuation phase and an interior point phase. The first phase is based completely on memory efficient operations such as matrix-vector multiplications, while the second phase is based on a truncated Newton's method. Furthermore, we show that various optimization techniques, including line search and continuation, can significantly accelerate convergence. The algorithm has global convergence at a geometric rate (a Q-linear rate in optimization terminology). We present a numerical comparison with several existing algorithms, including an analysis using benchmark data from the UCI machine learning repository, and show our algorithm is the most computationally efficient without loss of accuracy.

View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Related papers

Regularized Learning Schemes in Banach Spaces

SILVIA CATERINE MORENO VILLA

View PDFchevron_right

Learning With Subquadratic Regularization : A Primal-Dual Approach

2020

View PDFchevron_right

Learning exponential families in high-dimensions: Strong convexity and sparsity

2009

View PDFchevron_right

Multiplicative Updates for L 1–Regularized Linear and Logistic Regression

Lecture Notes in Computer Science

View PDFchevron_right

Iterative regularization for low complexity regularizers

Cesare Molinari

2022

View PDFchevron_right

Convergence results on an algorithm for norm constrained regularization and related problems

Sandra Augusta Santos

1997

View PDFchevron_right

Performance of First- and Second-Order Methods for L1-Regularized Least Squares Problems

2015

View PDFchevron_right

Stochastic methods for l 1 regularized loss minimization

2009

View PDFchevron_right

A second-order method for strongly convex $$\ell _1$$ ℓ 1 -regularization problems

Mathematical Programming, 2015

View PDFchevron_right

Properties and Iterative Methods for theQ-Lasso

Maryam Alghamdi

Abstract and Applied Analysis, 2013

View PDFchevron_right

A comparison of optimization methods and software for large-scale l1-regularized linear classification

2010

View PDFchevron_right

Effective Condition Number Bounds for Convex Regularization

Dennis Amelunxen

IEEE Transactions on Information Theory, 2020

View PDFchevron_right

A new regularization for sparse optimization

ANZIAM Journal, 2022

View PDFchevron_right

Spectral methods for regularization in learning theory

2005

View PDFchevron_right

Precise Error Analysis of Regularized <inline-formula> <tex-math notation="LaTeX">$M$ </tex-math> </inline-formula>-Estimators in High Dimensions

IEEE Transactions on Information Theory

View PDFchevron_right

Elastic-net regularization in learning theory

Journal of Complexity, 2009

View PDFchevron_right

A Universal Analysis of Large-Scale Regularized Least Squares Solutions

2017

View PDFchevron_right

A Second-Order Method for Strongly Convex L1-Regularization Problems

2013

View PDFchevron_right

Learning Rates of Least-Square Regularized Regression

Foundations of Computational Mathematics, 2006

View PDFchevron_right

Regularized learning schemes in feature Banach spaces

Analysis and Applications, 2016

View PDFchevron_right

Learning with regularizers in multilayer neural networks

Physical Review E, 1998

View PDFchevron_right

Performance of first- and second-order methods for -regularized least squares problems

2016

View PDFchevron_right

Solving lp-norm regularization with tensor kernels

2018

View PDFchevron_right

Consistency of Regularized Learning Schemes in Banach Spaces

View PDFchevron_right

Dynamic Regularizer with an Informative Prior

ArXiv, 2019

View PDFchevron_right

About
Press
Papers
Topics
We're Hiring!
Help Center

Find new research papers in:
Physics
Chemistry
Biology
Health Sciences
Ecology
Earth Sciences
Cognitive Science
Mathematics
Computer Science

Terms
Privacy
Copyright
Academia ©2025