Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Data Shuffling inconsistency #286

@pluskid

Description

@pluskid

While looking at the code for the io module, I found some inconsistency, maybe potential bugs we need to fix:

  • It seems data shuffling for MNIST iterator is only done once when the iterator is created, at here. No re-shuffling is done after each reset following the end of an epoch. I'm not sure this is intended or a bug, as we have a unit test in the python binding (see here) that explicitly assert this behavior. I believe re-shuffling after each epoch is helpful to improve convergence (though MNIST might be too trivial to see a difference).
  • It seems data shuffling for image record iterator is only done at least after one epoch, at here. I have not tried to understand every bit of the IO code yet, so please correct if I'm wrong. This means no matter we turned on the shuffle bit or not, the first round is always not shuffled. It could hurt the performance a lot if the dataset is originally sorted according to category. Although it seems unlikely that a typical image dataset will be sorted in that way, but we should fix this to avoid potential bug.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions