Skip to content

Clarification of stopping criteria tol of iterative solvers #22243

@sanjaradylov

Description

@sanjaradylov

Describe the issue linked to the documentation

Documentations of several supervised-learning models omit details about their tolerance-based early stopping. Some examples include LogisticRegression and RidgeClassifier:

tol : float, default=1e-4
Tolerance for stopping criteria.

tol : float, default=1e-3
Precision of the solution.

My concern is that provided with no detailed explanation of the meaning of tol, many ML practitioners tend to optimize solver as a hyperparameter along with it. But different solvers may have different conditions and hence different optimal bounds for tol. For example, I assume that liblinear, if uses coordinate descent, checks the duality gap, while saga compares the best loss value or gradient/coefficient norm with current. If so, it is at least theoretically inconsistent to search over a grid in the form of

{'solver': ['s_1', 's_2', ..., 's_N'],
 'tol': [t_1, t_2, ..., t_M]}

and maybe as redundant as, say, optimizing degree of the polynomial kernel in RBF-SVC. And generally, optimization methods behind the scenes remain black boxes---e.g., (quasi-)Newton and conjugate-gradient methods.

Suggest a potential alternative/fix

If it makes sense, I would love to contribute, but as far as I can tell, solvers other than SGD aren't mathematically formulated to be certain about their specific implementation details. Ideally, we could shed some light on mathematical details of other optimization methods in User Guide and explicitly state what value the (hyper)parametertol corresponds to in a particular solver/model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions