Skip to content

Conversation

@xzhu1900
Copy link
Contributor

This change adds one advanced support for cross-chunk shuffling.

For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the cross_chunk_shuffle_count_ on construction, advanced user can specify how many chunks to shuffle example from.

@pytorchbot pytorchbot added the module: cpp Related to C++ API label Jun 28, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangguanheng66 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangguanheng66 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jaliyae
Copy link
Contributor

jaliyae commented Jul 2, 2019

@pytorchbot merge this please.

@facebook-github-bot
Copy link
Contributor

@zhangguanheng66 merged this pull request in f0f2331.

xzhu1900 added a commit to xzhu1900/pytorch that referenced this pull request Jul 5, 2019
Summary:
This change adds one advanced support for cross-chunk shuffling.

For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the `cross_chunk_shuffle_count_` on construction, advanced user can specify how many chunks to shuffle example from.
Pull Request resolved: pytorch#22347

Differential Revision: D16081378

Pulled By: zhangguanheng66

fbshipit-source-id: fd001dfb9e66947839adecfb9893156fbbce80d0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants