Skip to content

Scaling issues in l-bfgs for LogisticRegression #15556

@amueller

Description

@amueller

So it looks like l-bfgs is very sensitive to scaling of the data, which can lead to convergence issues.
I feel like we might be able to fix this by changing the framing of the optimization?

example:

import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import scale

data = fetch_openml(data_id=1590, as_frame=True)
cross_val_score(LogisticRegression(), pd.get_dummies(data.data), data.target)

this gives convergence warnings, after scaling it doesn't. I have seen this in many places. While people should scale I think warning about number of iterations is not a good thing to show to the user. If we can fix this, I think we should.

Using the bank campaign data I got coefficients that were quite different if I increased the number of iterations (I got convergence warnings with the default of 100). If I scaled the data, that issue went away.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions