Add support for cross-chunk shuffling in ChunkDataset #22347

xzhu1900 · 2019-06-28T18:01:09Z

This change adds one advanced support for cross-chunk shuffling.

For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the cross_chunk_shuffle_count_ on construction, advanced user can specify how many chunks to shuffle example from.

update fork

facebook-github-bot

@zhangguanheng66 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@zhangguanheng66 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jaliyae · 2019-07-02T00:25:18Z

@pytorchbot merge this please.

facebook-github-bot · 2019-07-02T02:50:14Z

@zhangguanheng66 merged this pull request in f0f2331.

Summary: This change adds one advanced support for cross-chunk shuffling. For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the `cross_chunk_shuffle_count_` on construction, advanced user can specify how many chunks to shuffle example from. Pull Request resolved: pytorch#22347 Differential Revision: D16081378 Pulled By: zhangguanheng66 fbshipit-source-id: fd001dfb9e66947839adecfb9893156fbbce80d0

xzhu1900 and others added 6 commits June 12, 2019 18:19

Merge pull request #14 from pytorch/master

b765851

update fork

Merge remote-tracking branch 'pytorch/master'

16b4a44

Merge remote-tracking branch 'pytorch/master'

55a49d1

Merge remote-tracking branch 'pytorch/master'

d8ef551

adding support for multi-chunk shuffling

e494ad7

minor change

37d5abf

xzhu1900 requested review from ebetica, goldsborough and yf225 as code owners June 28, 2019 18:01

pytorchbot added the module: cpp Related to C++ API label Jun 28, 2019

apaszke approved these changes Jun 30, 2019

View reviewed changes

ezyang added the open source label Jul 1, 2019

facebook-github-bot reviewed Jul 1, 2019

View reviewed changes

facebook-github-bot closed this in f0f2331 Jul 2, 2019

facebook-github-bot added the merged label Jul 2, 2019

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for cross-chunk shuffling in ChunkDataset #22347

Add support for cross-chunk shuffling in ChunkDataset #22347

Uh oh!

xzhu1900 commented Jun 28, 2019

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

jaliyae commented Jul 2, 2019

Uh oh!

facebook-github-bot commented Jul 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Add support for cross-chunk shuffling in ChunkDataset #22347

Add support for cross-chunk shuffling in ChunkDataset #22347

Uh oh!

Conversation

xzhu1900 commented Jun 28, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

jaliyae commented Jul 2, 2019

Uh oh!

facebook-github-bot commented Jul 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants