-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Closed
Labels
Description
A possibly related issue to #3020 is that Ridge(normalize=True) breaks sample weight invariance properties: that is that removing samples is equivalent to setting sample weights to 0. For instance, running the check_sample_weights_invariance(kind=zeros) estimator check for Ridge(normalize=True) in #17441
fails by a significant amount for all solvers,
$ pytest sklearn/tests/test_common_non_default.py -k "Ridge and normalize and check_sample_weights_invariance" -v
[..]
________________ test_common_non_default[Ridge(normalize=True)-check_sample_weights_invariance(kind=zeros)] _________________
[..]
> assert_allclose(x, y, rtol=rtol, atol=atol, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=1e-09
E For Ridge, a zero sample_weight is not equivalent to removing the sample
E Mismatch: 100%
E Max absolute difference: 0.13578947
E Max relative difference: 0.10287081
E x: array([1.184211, 1.184211, 1.184211, 1.184211, 1.710526, 1.710526,
E 1.710526, 1.710526, 1.289474, 1.289474, 1.289474, 1.289474,
E 1.815789, 1.815789, 1.815789, 1.815789])
E y: array([1.32, 1.32, 1.32, 1.32, 1.6 , 1.6 , 1.6 , 1.6 , 1.4 , 1.4 , 1.4 ,
E 1.4 , 1.68, 1.68, 1.68, 1.68])
This makes somewhat sense as for instance StandardScaler doesn't account for sample weight and normalize likely doesn't also. But then I don't understand why his happens for Ridge but not LinearRegression, Lasso or ElasticNet.