shuffle data from hdf5 datasets by PatWie · Pull Request #1347 · BVLC/caffe

PatWie · 2014-10-22T13:42:27Z

The order of read HDF5 files and the order of the entries of the HDF5 files can be shuffled when setting the flag shuffle in the hdf5data layer

baeuml · 2014-10-24T09:20:30Z

Other than that it looks good to me! Cool!

PatWie · 2014-10-24T13:10:37Z

I changed some parts. ~~Should I open a new pull request, since rebasing does not work for me?~~ I did a git push --force

Yangqing · 2014-10-25T00:18:28Z

Could you also do a speed benchmark and see how shuffling affects typical read speed? It used to cause a lot of trouble when reading randomly from a leveldb. Usually large-scale datasets don't need shuffling that much so if speed is a concern, it might be better to keep sequential read.

(Since shuffling is turned off in default, I think having the capability is good.)

PatWie · 2014-10-26T14:54:01Z

Not everybody is able to use Caffe on highend GPU for the ImageNet challenge ;-)
For smaller datasets it is crucial to use a random order (see issue #1249 from @bearpaw). Maybe the BVLC team should provide a standard hf5 dataset and netlayout for benchmarking code changes.

shelhamer · 2015-01-16T23:56:52Z

@jeffdonahue can you review and merge if this looks good to you?

jeffdonahue · 2015-01-17T22:22:03Z

src/caffe/proto/caffe.proto

Please clarify this comment to explain that the HDF5 files themselves are shuffled but the order within any given file is fixed.

Everything will be shuffled: hdf5 files and entries in these hdf5 files.

Thanks, I see that now, my bad. I think it should still be clarified though -- it's not actually a full shuffle of the dataset (i.e., some orderings of the dataset are impossible to obtain) unless you only have a single HDF5 file (or each HDF5 file only has a single entry).

jeffdonahue · 2015-03-13T08:49:07Z

Replaced by #2118.

shuffle data

ee6bdf0

shelhamer added the enhancement label Oct 26, 2014

PatWie mentioned this pull request Dec 2, 2014

Shuffle hdf5 input data #1205

Closed

shelhamer assigned jeffdonahue Dec 2, 2014

jeffdonahue reviewed Jan 17, 2015
View reviewed changes

shelhamer added the JD label Mar 7, 2015

jeffdonahue mentioned this pull request Mar 13, 2015

Shuffle HDF5 data #2118

Merged

jeffdonahue closed this Mar 13, 2015

kukuruza mentioned this pull request Aug 13, 2015

switch from image collections to hdf5 kukuruza/City-Project#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shuffle data from hdf5 datasets#1347

shuffle data from hdf5 datasets#1347
PatWie wants to merge 1 commit intoBVLC:devfrom
PatWie:shufflehdf5

PatWie commented Oct 22, 2014

Uh oh!

baeuml commented Oct 24, 2014

Uh oh!

PatWie commented Oct 24, 2014

Uh oh!

Yangqing commented Oct 25, 2014

Uh oh!

PatWie commented Oct 26, 2014

Uh oh!

shelhamer commented Jan 16, 2015

Uh oh!

jeffdonahue Jan 17, 2015

Uh oh!

wieschoo Jan 19, 2015

Uh oh!

jeffdonahue Jan 19, 2015

Uh oh!

jeffdonahue commented Mar 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

PatWie commented Oct 22, 2014

Uh oh!

baeuml commented Oct 24, 2014

Uh oh!

PatWie commented Oct 24, 2014

Uh oh!

Yangqing commented Oct 25, 2014

Uh oh!

PatWie commented Oct 26, 2014

Uh oh!

shelhamer commented Jan 16, 2015

Uh oh!

jeffdonahue Jan 17, 2015

Choose a reason for hiding this comment

Uh oh!

wieschoo Jan 19, 2015

Choose a reason for hiding this comment

Uh oh!

jeffdonahue Jan 19, 2015

Choose a reason for hiding this comment

Uh oh!

jeffdonahue commented Mar 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants