-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add aten mkldnn ops: relu, max_pool2d and avg_pool2d #19205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Differential Revision: D14850598 Differential Version: 79209389
| - func: max_pool2d_with_indices(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> (Tensor, Tensor) | ||
| python_module: nn | ||
| dispatch: | ||
| CPU: max_pool2d_with_indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this just going to break derivatives for the same reason as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope because the derivative of max_pool2d_with_indices has been defined in tools/autograd/derivatives.yaml
| ideep::algorithm::pooling_max); | ||
| return std::make_tuple( | ||
| output, | ||
| // TODO: ideep currently doesn't expose the max indices, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm...would it work to just throw an exception if they ask for the indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way to know whether indices is required or not in c++. We could throw in python side (where return_indices is an argument) but that would require exposing is_mkldnn to python side (not necessary a bad idea).
We could get the indices from mkldnn as well from a non-public api (@Jianhui-Li mentioned we can get it from the ideep::tensor get_extra)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can use get_extra of the output ideep tensor. See this:
https://github.com/intel/ideep/blob/960ee3a20d975655d3fa8e40e445898a1e3b4078/include/ideep/computations.hpp#L3882
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the problem here is that
indiceof mkldnn pooling is different from what PyTorch looks like, both the dtype and content.
a. mkldnn -indiceis a local view w.r.t. kernel, thedtypecould be eitherint32_torint8_t
b. PyTorch -indiceis a global view w.r.t. input.
See mkldnn_issue_259 for detail.
So directly returning mkldnn indice is meaningless. The mkldnn indice is more like a cache used by backward during training. Well in that case, it should be handled byworkspacein ATen. - at the current moment, probably the most simple way to address this is remove mkldnn path for
mkldnn_max_pool2d_with_indices_out. Which means, only whenindiceis not in the output, usemkldnn; whenindiceis in the output, useTHNN.
This won't make any difference for models likeResNetorResNext. - in case we need to support unpooling (up-sampling) where
indiceis needed, we can convert the mkldnnindiceto global by doing the math pointed in the link above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq Due to the differences between PyTorch indices tensor (using global indices) and MKL-DNN indices tensor (using local indices), we are considering to hide the MKL-DNN indices inside ideep as an "opaque" format. Users can get PyTorch indices with to_dense(). This may take a while and before that happens, how about we only implement max_pool2d without indices in MKL-DNN path as @mingfeima proposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aten max_pool2d right now doesn’t have dispatch (and is not even exposed to python), so in order to do that we will need to add derivatives for it to let codegen do the unwrap thing for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you just want unwrap - there's 'require_tensor: True' annotation you can put in native_functions.yaml
As for exposing to python, I think if you make it as 'python_module: nn' or somethingl ike this, it'd be exposed only as torch._C._nn - I think it's ok to have it but not call it from functional wrappers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's explicitly not exposed to python: https://github.com/pytorch/pytorch/blob/1c5073f/tools/autograd/gen_python_functions.py#L33
I don't dare to open this can of worms (actually I did at the very beginning and then ran into a test failure complaining derivative for max_pool2d is not implemented (or similar)).
Since
- indices are only used in training
- mkldnn/ideep will add support to expose indices in their api
Let's just return some garbage values and only support inference for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think it should have worked if we were to check requires_grad. But I don't insist.
Can you please expand this comment with more details on why garbage is fine? (pretty much what we have discussed above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Differential Revision: D14850598 Differential Version: 79296619
Differential Revision: D14850598 Differential Version: 79296710
Differential Revision: D14850598 Differential Version: 79299275
| ideep::algorithm::pooling_max); | ||
| return std::make_tuple( | ||
| output, | ||
| // TODO: ideep currently doesn't expose the max indices, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can use get_extra of the output ideep tensor. See this:
https://github.com/intel/ideep/blob/960ee3a20d975655d3fa8e40e445898a1e3b4078/include/ideep/computations.hpp#L3882
| MkldnnCPU: mkldnn_max_pool2d_with_indices_out | ||
|
|
||
| # Return: (Tensor output, Tensor indices) | ||
| - func: max_pool2d_with_indices(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> (Tensor, Tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: There is also max_pool2d that doesn't need to return indices, useful for inference workload. We can implement it more efficiently with MKL-DNN by setting the prop_kind to forward_inference.
| ceil_mode | ||
| ); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MKLDNN treats padding differently with ceil mode.
e.g. the following case would result runtime error, both max pooling and avg pooling:
import torch
x = torch.randn(1, 1, 10, 10, dtype=torch.float32) * 10
max_pool2d = torch.nn.MaxPool2d(kernel_size=(4,5),stride=(3,4),padding=(1,2), ceil_mode=True)
# max_pool2d = torch.nn.AvgPool2d(kernel_size=(4,5),stride=(3,4),padding=(1,2), ceil_mode=True)
output = max_pool2d(x.to_mkldnn()).to_dense()please refer caffe code of computing pooling output size for detail. @XiaobingSuper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has this comment been addressed? (sorry I didn't follow the point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem can be solved by change the mkldnn padding as caffe's doing: https://github.com/intel/caffe/blob/362a3b375e8f5bc64f5338d18f1742448cfb8fb7/src/caffe/layers/mkldnn_pooling_layer.cpp#L148
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that mkldnn is to get given output size by changing padding size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an assert to only support non ceil_mode for now. Let's do the padding trick in a follow up PR.
Differential Revision: D14850598 Differential Version: 79689942
Differential Revision: D14850598 Differential Version: 79698402
Differential Revision: D14850598 Differential Version: 79729546
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as long as the sizes calculation comment has been addressed
| ceil_mode | ||
| ); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has this comment been addressed? (sorry I didn't follow the point)
| ideep::algorithm::pooling_max); | ||
| return std::make_tuple( | ||
| output, | ||
| // TODO: ideep currently doesn't expose the max indices, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think it should have worked if we were to check requires_grad. But I don't insist.
Can you please expand this comment with more details on why garbage is fine? (pretty much what we have discussed above)
Differential Revision: D14850598 Differential Version: 80386616
Differential Revision: D14850598 Differential Version: 80533693
Differential Revision: D14850598 Differential Version: 80534912
Differential Revision: D14850598 Differential Version: 80541664
Differential Revision: D14850598 Differential Version: 80560023
Differential Revision: D14850598 Differential Version: 80580144
Differential Revision: D14850598 Differential Version: 80683622
| Tensor new_with_itensor_mkldnn(ideep::tensor&& it, const TensorOptions& options) { | ||
| // NOTE: int32_t dims from ideep::tensor but sizes needs int64_t | ||
| // TODO: support int64_t dims in ideep::tensor to avoid extra conversion | ||
| AT_ASSERT(!it.has_extra()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we originally have it here?
| IntArrayRef padding, | ||
| bool ceil_mode, | ||
| bool count_include_pad) { | ||
| output = _mkldnn_pool2d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I think it's wrong. _out variants are supposed to assign things in-place in the same Tensor instance, no? @gchanan to double-check
and I'd add a test for it too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a test case, but IIRC the _out version doesn't require same instance (the _ version does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right _out requires same identity. We will need to change opaque tensor impl to allow setting sizes to support this. In this PR let's not support in-place avg_pool for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually _out versions just fail if the size is wrong. But I agree that we can just skip _out version here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh so normally the output tensor got passed in is supposed to already have the right sizes? (and other meta info like dtype)
| - func: max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor | ||
|
|
||
| - func: mkldnn_max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> Tensor | ||
| requires_tensor: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires_tensor is probably not required here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep just for documentation purpose
Differential Revision: D14850598 Differential Version: 80773731
Differential Revision: D14850598 Differential Version: 80780046
Differential Revision: D14850598 Differential Version: 80785119
Differential Revision: D14850598 Differential Version: 80799727
Summary: Pull Request resolved: pytorch/pytorch#19205 Reviewed By: dzhulgakov Differential Revision: D14850598 fbshipit-source-id: 5bbd5909c06df9c980de680ffb81bf772766c0ba
|
This pull request has been merged in 4864000. |
Summary: Pull Request resolved: pytorch#19205 Reviewed By: dzhulgakov Differential Revision: D14850598 fbshipit-source-id: 5bbd5909c06df9c980de680ffb81bf772766c0ba
Summary: Modify MKLDNN pooling operation to support ceil mode by adjusting the right/bottom padding accordingly. This is done similarly as in Caffe (see discussion #19205 (comment)). To make this possible, I split the padding to left and right (top / bottom). This naming is confusing but actually follows mkldnn's own naming for pooling::compute(). We increase the r paddings so that it matches the ceiling mode expected output size. Strengthened the test case. Pull Request resolved: #21310 Reviewed By: bddppq Differential Revision: D15611664 Pulled By: akyrola fbshipit-source-id: 46b40015dafef69a8fd5e7b2c261d8dbf448cd20
Summary: Modify MKLDNN pooling operation to support ceil mode by adjusting the right/bottom padding accordingly. This is done similarly as in Caffe (see discussion pytorch/pytorch#19205 (comment)). To make this possible, I split the padding to left and right (top / bottom). This naming is confusing but actually follows mkldnn's own naming for pooling::compute(). We increase the r paddings so that it matches the ceiling mode expected output size. Strengthened the test case. Pull Request resolved: pytorch/pytorch#21310 Reviewed By: bddppq Differential Revision: D15611664 Pulled By: akyrola fbshipit-source-id: 46b40015dafef69a8fd5e7b2c261d8dbf448cd20
Stack:
:white_circle: #19633 Add is_mkldnn to at::Tensor 💚
:white_circle: #19204 Add aten mkldnn conv2d operator 💚
:black_circle: #19205 Add aten mkldnn ops: relu, max_pool2d and avg_pool2d 💚
:white_circle: #19206 Add aten mkldnn batch_norm operator 💚
:white_circle: #19207 Add aten mkldnn add operator 💚
:white_circle: #19209 Add aten mkldnn view operator 💚
:white_circle: #19210 Add aten mkldnn linear operator 💚
:white_circle: #19648 Adjust resnext run script 💚
Pull Request resolved: #19205
Differential Revision: D14850598