Skip to content

Conversation

@Chillee
Copy link
Collaborator

@Chillee Chillee commented Jun 18, 2019

In talks with @smessmer, we decided that it'd be better to put the logic in list, as optimal behavior requires knowing .capacity()

Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this:

Pytorch batch_gather took 0.018311 seconds.
Pytorch batch_gather jit took 0.013921 seconds.
Pytorch vectorized batch_gather took 0.001384 seconds.

Previously, batch_gather jit took 3x as long as batch_gather.

Some logic taken from #21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for +=.

Note that we can't jit the vectorized version as we think torch.arange returns a float tensor.

@pytorchbot pytorchbot added oncall: jit Add this issue/PR to JIT oncall triage queue module: internals Related to internal abstractions in c10 and ATen labels Jun 18, 2019
// Get the appropriate builtin op for this augmented assignment
// If the RHS is a tensor, return the corresponding ATen in-place op
// If it's a list of scalars, then return the corresponding list augment op
Symbol getAugOp(const AugAssign& stmt, bool isTensor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you're doing here, this seems unrelated. Can we have this in a separate PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what's needed to make a += b do the inplace add instead of desugaring it into a = a + b.

Copy link
Contributor

@smessmer smessmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now. Thanks.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chillee is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 27, 2019
Summary:
In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()`

Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this:
```
Pytorch batch_gather took 0.018311 seconds.
Pytorch batch_gather jit took 0.013921 seconds.
Pytorch vectorized batch_gather took 0.001384 seconds.
```
Previously, `batch_gather jit` took 3x as long as `batch_gather`.

Some logic taken from pytorch/pytorch#21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`.

Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor.
Pull Request resolved: pytorch/pytorch#21896

Differential Revision: D15998628

Pulled By: Chillee

fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3
@facebook-github-bot
Copy link
Contributor

@Chillee merged this pull request in c9626a1.

xzhu1900 pushed a commit to xzhu1900/pytorch that referenced this pull request Jul 5, 2019
Summary:
In talks with smessmer, we decided that it'd be better to put the logic in `list`, as optimal behavior requires knowing `.capacity()`

Results on my cpu (for the benchmark here: https://twitter.com/VahidK/status/1138674536679821312) now look like this:
```
Pytorch batch_gather took 0.018311 seconds.
Pytorch batch_gather jit took 0.013921 seconds.
Pytorch vectorized batch_gather took 0.001384 seconds.
```
Previously, `batch_gather jit` took 3x as long as `batch_gather`.

Some logic taken from pytorch#21690. Note that these two PR's are somewhat orthogonal. That PR handles this benchmark by looking at the alias analysis, while this PR specializes for `+=`.

Note that we can't jit the vectorized version as we think `torch.arange` returns a float tensor.
Pull Request resolved: pytorch#21896

Differential Revision: D15998628

Pulled By: Chillee

fbshipit-source-id: b0085960da4613578b94deb98ac62c0a4532a8c3
@facebook-github-bot facebook-github-bot deleted the inplaceadd branch July 13, 2020 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: internals Related to internal abstractions in c10 and ATen oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants