-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Closed
Milestone
Description
The point of LabelKFold is that instances with the same label end up in the same fold:
In [24]: cross_validation.LabelKFold([0,0,0,0,2,2,2,2], n_folds=2, shuffle=False, random_state=1).idxs
Out[24]: array([ 1., 1., 1., 1., 0., 0., 0., 0.])
However, the shuffle does not maintain this:
In [25]: cross_validation.LabelKFold([0,0,0,0,2,2,2,2], n_folds=2, shuffle=True, random_state=1).idxs
Out[25]: array([ 0., 1., 1., 0., 1., 0., 1., 0.])
I believe the shuffle should be applied at an earlier stage, and should be applied to the labels as well.