-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Closed
Description
So it looks like l-bfgs is very sensitive to scaling of the data, which can lead to convergence issues.
I feel like we might be able to fix this by changing the framing of the optimization?
example:
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import scale
data = fetch_openml(data_id=1590, as_frame=True)
cross_val_score(LogisticRegression(), pd.get_dummies(data.data), data.target)this gives convergence warnings, after scaling it doesn't. I have seen this in many places. While people should scale I think warning about number of iterations is not a good thing to show to the user. If we can fix this, I think we should.
Using the bank campaign data I got coefficients that were quite different if I increased the number of iterations (I got convergence warnings with the default of 100). If I scaled the data, that issue went away.
thomasjpfan, ogrisel, DnanaDev and mathurinm