Skip to content

LabelKFold and shuffle=True gives incorrect results #5292

@andreasvc

Description

@andreasvc

The point of LabelKFold is that instances with the same label end up in the same fold:

In [24]: cross_validation.LabelKFold([0,0,0,0,2,2,2,2], n_folds=2, shuffle=False, random_state=1).idxs
Out[24]: array([ 1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.])

However, the shuffle does not maintain this:

In [25]: cross_validation.LabelKFold([0,0,0,0,2,2,2,2], n_folds=2, shuffle=True, random_state=1).idxs
Out[25]: array([ 0.,  1.,  1.,  0.,  1.,  0.,  1.,  0.])

I believe the shuffle should be applied at an earlier stage, and should be applied to the labels as well.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions