Partially parallelize randperm on CPU. #21529

xuhdev · 2019-06-07T18:35:12Z

This commit parallelizes the variable initialization (from 1 to n) step on CPU.

jeffreyksmithjr · 2019-06-10T22:50:47Z

This could use a more detailed description, if there is more context around this change.

xuhdev · 2019-06-10T22:55:02Z

@jeffreyksmithjr Thanks, I updated the description. I don't have much more to say -- this is a pretty general performance improvement that is self-explanatory.

VitalyFedyunin

Generally looks good, but consider:

moving cpu code into the native/cpu folder, as it will enable AVX support.
implement special code branch for the situation when r__stride_0 == 1 as it is going be almost always the case, and the kernel will work much faster.

xuhdev · 2019-06-11T07:58:20Z

@VitalyFedyunin Thanks, this is really good to know. I will try to look into how to make many CPU functions to the native/cpu folder.

Do you happen to know where I can systematically learn these kinds of information for PyTorch internals? It would be quite helpful if such a document is available.

VitalyFedyunin · 2019-06-11T21:16:43Z

We are working on better doc now, so far you can look at aten/src/ATen/native/cpu/README.md

Also it is reasonable to attach benchmark methodology and results, especially if you are working on performance related tasks.

xuhdev · 2019-06-13T23:18:09Z

@VitalyFedyunin I looked into the code again. I kind of what to change the implementation to make use of arange_out instead of implementing once again, and leave all possible future optimization (if it will ever happen) to arange_out. I have changed the current code to follow this thought. What do you think?

But I only did this for stride == 1 -- I looked into the code of range_cpu_out, and it seems to assume that stride is always 1. I'm a bit confused: Is it same to assume stride to be 1, or arange_out didn't do it correctly?

zhangguanheng66 · 2019-06-14T18:22:01Z

@VitalyFedyunin Do you mind if you could take a look at the recent commits and land the PR? Thanks.

VitalyFedyunin · 2019-06-14T18:26:06Z

Sure you can, go ahead with it. Even previous state was fine. I even recommend to remove this ==1 check, as It is implemented incorrectly. What I meant by previous comment is something like:

if (r__stride == 1) {
 at::parallel_for(0, n, internal::GRAIN_SIZE,
                    [&r__data, &r__stride_0](int64_t p_begin, int64_t p_end) {
      for(int64_t i = p_begin; i < p_end; i++)
        r__data[i] = static_cast<scalar_t>(i);
    }); 
}
else {
 at::parallel_for(0, n, internal::GRAIN_SIZE,
                    [&r__data, &r__stride_0](int64_t p_begin, int64_t p_end) {
      for(int64_t i = p_begin; i < p_end; i++)
        r__data[i*r__stride_0] = static_cast<scalar_t>(i);
    });
}

Which helps to avoid multiplication.

But previous version with just:

 at::parallel_for(0, n, internal::GRAIN_SIZE,
                    [&r__data, &r__stride_0](int64_t p_begin, int64_t p_end) {
      for(int64_t i = p_begin; i < p_end; i++)
        r__data[i*r__stride_0] = static_cast<scalar_t>(i);
    });

is also good

Please revert to previous state, without if check.

xuhdev · 2019-06-14T19:42:23Z

Sure, restored.

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

umanwizard · 2019-06-17T19:26:58Z

@pytorchbot rebase this please

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This commit parallelizes the variable initialization (from 1 to n) step on CPU. Pull Request resolved: pytorch/pytorch#21529 Differential Revision: D15855402 Pulled By: VitalyFedyunin fbshipit-source-id: f1ba54587451f9cb0eb5e542c3c5b458b48e1a3d

facebook-github-bot · 2019-06-20T19:15:07Z

@VitalyFedyunin merged this pull request in 0702b5f.

Summary: This commit parallelizes the variable initialization (from 1 to n) step on CPU. Pull Request resolved: pytorch#21529 Differential Revision: D15855402 Pulled By: VitalyFedyunin fbshipit-source-id: f1ba54587451f9cb0eb5e542c3c5b458b48e1a3d

pytorchbot added the module: operators label Jun 7, 2019

xuhdev force-pushed the parallel/randperm branch from 6520020 to 25ea431 Compare June 7, 2019 18:42

xuhdev changed the title ~~Partially parallelize randperm.~~ Partially parallelize randperm on CPU. Jun 7, 2019

ezyang added the open source label Jun 10, 2019

jeffreyksmithjr requested a review from VitalyFedyunin June 10, 2019 22:50

jeffreyksmithjr added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 10, 2019

VitalyFedyunin previously approved these changes Jun 11, 2019

View reviewed changes

xuhdev force-pushed the parallel/randperm branch 2 times, most recently from 4d0e5a0 to f226768 Compare June 13, 2019 23:15

xuhdev force-pushed the parallel/randperm branch from f226768 to bbcf7d7 Compare June 14, 2019 00:17

xuhdev force-pushed the parallel/randperm branch 2 times, most recently from 8698320 to 9f4cbd7 Compare June 14, 2019 19:42

xuhdev force-pushed the parallel/randperm branch from 9f4cbd7 to 3ce374a Compare June 14, 2019 22:45

Partially parallelize randperm on CPU.

6345936

xuhdev force-pushed the parallel/randperm branch from 3ce374a to 6345936 Compare June 15, 2019 16:21

VitalyFedyunin approved these changes Jun 15, 2019

View reviewed changes

facebook-github-bot reviewed Jun 17, 2019

View reviewed changes

Merge remote-tracking branch 'origin/master' into HEAD

3589dd9

facebook-github-bot reviewed Jun 18, 2019

View reviewed changes

facebook-github-bot closed this in 0702b5f Jun 20, 2019

xuhdev deleted the parallel/randperm branch June 20, 2019 18:18

facebook-github-bot added the merged label Jun 20, 2019

mruberry added the Merged label Oct 28, 2020

Partially parallelize randperm on CPU. #21529

Partially parallelize randperm on CPU. #21529

Uh oh!

Conversation

xuhdev commented Jun 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffreyksmithjr commented Jun 10, 2019

Uh oh!

xuhdev commented Jun 10, 2019

Uh oh!

VitalyFedyunin left a comment

Choose a reason for hiding this comment

Uh oh!

xuhdev commented Jun 11, 2019

Uh oh!

VitalyFedyunin commented Jun 11, 2019

Uh oh!

xuhdev commented Jun 13, 2019

Uh oh!

zhangguanheng66 commented Jun 14, 2019

Uh oh!

VitalyFedyunin commented Jun 14, 2019

Uh oh!

xuhdev commented Jun 14, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

umanwizard commented Jun 17, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

xuhdev commented Jun 7, 2019 •

edited

Loading