Reduce needless copying when returning lists of tensors in the JIT interpreter. #21690

resistor · 2019-06-12T17:35:59Z

This fixes the JIT performance gap reported in https://twitter.com/VahidK/status/1138677898439561216

facebook-github-bot

@resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

driazati · 2019-06-12T18:05:10Z

torch/csrc/jit/register_prim_ops.cpp

Does moving all the elements of a and b create a use-after-move issue if a or b is accessed later?

Is it possible for someone to have another reference to them?

oh, yes it is. Add is a functional operation.

suo · 2019-06-12T18:11:17Z

torch/csrc/jit/register_prim_ops.cpp

oh, yes it is. Add is a functional operation.

resistor · 2019-06-12T18:33:30Z

@suo @driazati Can you look at the new version?

facebook-github-bot

@resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

driazati

Looks good, though the resulting ops could be a little simpler by doing the dispatch on use_count at compile time by returning an Operation instead

resistor · 2019-06-12T23:19:30Z

This currently conflicts with #21170 which I think might have the same use-after-free bug as the original version of this...

Chillee · 2019-06-13T22:32:48Z

I assume you meant to link to the c10::List PR and not my math.log standard library PR?

smessmer · 2019-06-25T03:09:56Z

I'd rather have the optimization inside of the List class, then more code could potentially profit from it. Afaik, #21896 is already doing that?

facebook-github-bot

@resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

resistor · 2019-06-27T15:20:57Z

Is this going in soon? I have a PR to rebase past it.

Summary: In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()` Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this: ``` Pytorch batch_gather took 0.018311 seconds. Pytorch batch_gather jit took 0.013921 seconds. Pytorch vectorized batch_gather took 0.001384 seconds. ``` Previously, `batch_gather jit` took 3x as long as `batch_gather`. Some logic taken from #21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`. Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor. Pull Request resolved: #21896 Differential Revision: D15998628 Pulled By: Chillee fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3

Summary: In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()` Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this: ``` Pytorch batch_gather took 0.018311 seconds. Pytorch batch_gather jit took 0.013921 seconds. Pytorch vectorized batch_gather took 0.001384 seconds. ``` Previously, `batch_gather jit` took 3x as long as `batch_gather`. Some logic taken from pytorch/pytorch#21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`. Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor. Pull Request resolved: pytorch/pytorch#21896 Differential Revision: D15998628 Pulled By: Chillee fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3

…terpreter.

facebook-github-bot

@resistor has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

smessmer

looks good, thanks

smessmer · 2019-06-28T21:50:12Z

torch/csrc/jit/register_prim_ops.cpp

-  for (T b_element : b) {
-    ret.push_back(std::move(b_element));
+
+  if (a.use_count() == 1) {


Wondering if we should have a List::make_unshared_copy() that ensures that after calling it, use_count == 1 and does a copy if it needs to, or something for this. Feels like this could be useful in other ops as well.

facebook-github-bot · 2019-06-29T04:07:41Z

This pull request has been merged in 7cc8f37.

Summary: In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()` Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this: ``` Pytorch batch_gather took 0.018311 seconds. Pytorch batch_gather jit took 0.013921 seconds. Pytorch vectorized batch_gather took 0.001384 seconds. ``` Previously, `batch_gather jit` took 3x as long as `batch_gather`. Some logic taken from pytorch#21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`. Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor. Pull Request resolved: pytorch#21896 Differential Revision: D15998628 Pulled By: Chillee fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3

…terpreter. (pytorch#21690) Summary: This fixes the JIT performance gap reported in https://twitter.com/VahidK/status/1138677898439561216 Pull Request resolved: pytorch#21690 Differential Revision: D15783709 fbshipit-source-id: 23bb4acda6b60c27e95667e1d53c7d261a87167d

resistor requested a review from suo June 12, 2019 17:35

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jun 12, 2019

facebook-github-bot reviewed Jun 12, 2019

View reviewed changes

suo approved these changes Jun 12, 2019

View reviewed changes

driazati reviewed Jun 12, 2019

View reviewed changes

suo requested changes Jun 12, 2019

View reviewed changes

torch/csrc/jit/register_prim_ops.cpp Outdated

Copy link

Member

suo Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes it is. Add is a functional operation.

resistor force-pushed the list branch 2 times, most recently from fbefa94 to 5b0e8b9 Compare June 12, 2019 18:32

facebook-github-bot reviewed Jun 12, 2019

View reviewed changes

driazati approved these changes Jun 12, 2019

View reviewed changes

resistor force-pushed the list branch from 5b0e8b9 to 59b42ba Compare June 12, 2019 21:50

resistor mentioned this pull request Jun 12, 2019

Use c10::List #21177

Closed

Chillee mentioned this pull request Jun 18, 2019

Made a += b for lists do an in place add #21896

Closed

resistor force-pushed the list branch from 59b42ba to a1eb8da Compare June 25, 2019 03:00

pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label Jun 25, 2019

facebook-github-bot reviewed Jun 25, 2019

View reviewed changes

Reduce needless copying when returning lists of tensors in the JIT in…

0b4f98f

…terpreter.

resistor force-pushed the list branch from a1eb8da to 0b4f98f Compare June 28, 2019 20:21

facebook-github-bot reviewed Jun 28, 2019

View reviewed changes

smessmer approved these changes Jun 28, 2019

View reviewed changes

facebook-github-bot closed this in 7cc8f37 Jun 29, 2019

facebook-github-bot added the merged label Jun 29, 2019

mruberry added the Merged label Oct 28, 2020

Reduce needless copying when returning lists of tensors in the JIT interpreter. #21690

Reduce needless copying when returning lists of tensors in the JIT interpreter. #21690

Uh oh!

Conversation

resistor commented Jun 12, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

driazati Jun 12, 2019

Choose a reason for hiding this comment

Uh oh!

resistor Jun 12, 2019

Choose a reason for hiding this comment

Uh oh!

suo Jun 12, 2019

Choose a reason for hiding this comment

Uh oh!

suo Jun 12, 2019

Choose a reason for hiding this comment

Uh oh!

resistor commented Jun 12, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

driazati left a comment

Choose a reason for hiding this comment

Uh oh!

resistor commented Jun 12, 2019

Uh oh!

Chillee commented Jun 13, 2019

Uh oh!

smessmer commented Jun 25, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

resistor commented Jun 27, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

smessmer left a comment

Choose a reason for hiding this comment

Uh oh!

smessmer Jun 28, 2019

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants