Skip to content

Conversation

@f0k
Copy link
Contributor

@f0k f0k commented Jun 27, 2019

Fixes #21935 by using the integer floor division that was introduced for convolution shapes in #9640. Without this fix, the pooling operators can produce a 1-element output in cases they shouldn't.

Disclaimer: I couldn't properly test it locally (it's not picking up the modified version for some reason). I'm marking this WIP until I checked what the CI tools say...

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: nn Related to torch.nn module: operators labels Jun 27, 2019
@f0k
Copy link
Contributor Author

f0k commented Jun 28, 2019

The test I added failed, so the test seems good. I don't see what's wrong with the implementation, though. When testing locally, it does not seem to call pooling_output_shape() at all, so it doesn't matter that I corrected it -- I even tried adding +10 and std::cout without seeing any effects (adding invalid statements gives me compilation errors, though, so at least it's getting compiled).

Quick one-liner to test the implementation:

python -c 'import torch; print(torch.nn.functional.max_pool1d(torch.zeros((1, 1, 4)), 3, stride=2, padding=0, dilation=2, ceil_mode=False).shape)'

This should raise a RuntimeError, but right now it prints (1, 1, 1). Any idea what's going on?

@f0k
Copy link
Contributor Author

f0k commented Jun 28, 2019

Okay, I grepped the source code for my (ceil_mode ? stride - 1 : 0) construct, and found that #20691 has copied it to another place: https://github.com/pytorch/pytorch/blob/31e1e63/aten/src/ATen/native/Pool.h#L24
This is annoying. I should have thought of this earlier and not waste my time on trying to get the test working correctly and tracing through the code.

@f0k f0k force-pushed the fix-negative-pool-shape branch from 0dd2b88 to d54b7d6 Compare June 28, 2019 12:31
@pytorchbot pytorchbot added the module: cuda Related to torch.cuda, and CUDA support in general label Jun 28, 2019
@f0k
Copy link
Contributor Author

f0k commented Jun 28, 2019

Amended my commit to change all copies of the code. Fingers crossed.

Is there a way to get rid of the copies? Can THNN and THCUNN use the ATen version? Or is there another shared place for such things?

@f0k f0k changed the title WIP: Use integer floor division for pooling shape computation Use integer floor division for pooling shape computation Jun 28, 2019
@f0k f0k force-pushed the fix-negative-pool-shape branch from d54b7d6 to 5deb60e Compare June 28, 2019 13:45
@soumith soumith requested review from ezyang and skrah June 29, 2019 04:47
@f0k f0k changed the title Use integer floor division for pooling shape computation WIP: Use integer floor division for pooling shape computation Jul 1, 2019
@f0k
Copy link
Contributor Author

f0k commented Jul 1, 2019

Status: My test passes, but there's another test that relied on the broken pooling behavior. Will look into that later (maybe not today); marked as WIP again.

@ezyang
Copy link
Contributor

ezyang commented Jul 1, 2019

Is there a way to get rid of the copies? Can THNN and THCUNN use the ATen version? Or is there another shared place for such things?

Yes please! There is no technical barrier here, it's purely a matter of code organization.

@f0k
Copy link
Contributor Author

f0k commented Jul 2, 2019

Can THNN and THCUNN use the ATen version?

Yes please! There is no technical barrier here, it's purely a matter of code organization.

The THNN version has been extended with support for asymmetric padding, though (in #21310), which neither the THCUNN nor the ATen version have. Shall this be ported over then? Otherwise THNN cannot use ATen. Is ATen supposed to replace THNN and THCUNN in the long run, or are they meant to co-exist?

@ezyang
Copy link
Contributor

ezyang commented Jul 2, 2019

ATen is meant to replace THNN/THCUNN; the way we are killing the old code is by porting it to ATen (most notable change in a port is removal of manual refcounting). We should not have accepted the previous PR if there was already a copy of the code in ATen. @akyrola do you have an opinion here? The correct fix at this point is to make sure asymmetric padding works for all versions of the code (and deduplicate code, if there are multiple copies). I have not actually looked into the situation.

@akyrola
Copy link

akyrola commented Jul 2, 2019

@ezyang i am not really involved with pytorch development at this point, so don't have an opinion

@f0k
Copy link
Contributor Author

f0k commented Jul 3, 2019

The correct fix at this point is to make sure asymmetric padding works for all versions of the code (and deduplicate code, if there are multiple copies). I have not actually looked into the situation.

Ok. As far as I understand, asymmetric padding was added to support ceil_mode pooling with mkldnn, which doesn't support it natively. That's why it hadn't been added to the THCUNN version. So I think the correct fix will be to have https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/mkldnn/Utils.cpp use the ATen copy of the pooling size implementation, not the THNN copy of the implementation. The ATen copy will have to be adapted to match the THNN one. I can do that.

I will then try deleting the THNN and THCUNN versions to see if anything else still depends on them that should be using ATen instead. Does that make sense?

Finally, I will check the test that relied on the broken pooling behavior.

@ezyang
Copy link
Contributor

ezyang commented Jul 3, 2019

I will then try deleting the THNN and THCUNN versions to see if anything else still depends on them that should be using ATen instead. Does that make sense?

That'd be great! Thank you for your hard work!

@pytorchbot pytorchbot added the oncall: quantization Quantization support in PyTorch label Jul 9, 2019
@pytorchbot pytorchbot added the module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration label Jul 9, 2019
@f0k
Copy link
Contributor Author

f0k commented Jul 9, 2019

The failing existing test had a python implementation of the pooling shape computation, but used float division instead of an integer division. The test only failed now that the pooling shape computation in torch was corrected. Fixed the python implementation.

There were two references to the THNN pooling shape function; I replaced them with the ATen version and removed the THNN and THCUNN versions completely.

@f0k
Copy link
Contributor Author

f0k commented Jul 10, 2019

I think this is good to go, I'll remove the WIP label.

The only failing test has nothing to do with me, as far as I see:

Jul 09 16:47:24 CMake Error at cmake/Modules/FindOpenMP.cmake:287 (cmake_parse_implicit_link_info):
Jul 09 16:47:24   CMAKE_PARSE_IMPLICIT_LINK_INFO Function invoked with incorrect arguments
Jul 09 16:47:24   for function named: CMAKE_PARSE_IMPLICIT_LINK_INFO
Jul 09 16:47:24 Call Stack (most recent call first):
Jul 09 16:47:24   cmake/Modules/FindOpenMP.cmake:481 (_OPENMP_GET_FLAGS)
Jul 09 16:47:24   cmake/Modules/FindMKL.cmake:172 (FIND_PACKAGE)
Jul 09 16:47:24   cmake/Modules/FindMKL.cmake:265 (CHECK_ALL_LIBRARIES)
Jul 09 16:47:24   cmake/Dependencies.cmake:131 (find_package)
Jul 09 16:47:24   CMakeLists.txt:270 (include)

@f0k f0k changed the title WIP: Use integer floor division for pooling shape computation Use integer floor division for pooling shape computation Jul 10, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 17, 2019
Summary:
Fixes pytorch/pytorch#21935 by using the integer floor division that was introduced for convolution shapes in pytorch/pytorch#9640. Without this fix, the pooling operators can produce a 1-element output in cases they shouldn't.

Disclaimer: I couldn't properly test it locally (it's not picking up the modified version for some reason). I'm marking this WIP until I checked what the CI tools say...
Pull Request resolved: pytorch/pytorch#22304

Differential Revision: D16181955

Pulled By: ezyang

fbshipit-source-id: a2405372753572548b40616d1206848b527c8121
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 5adba33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration module: nn Related to torch.nn oncall: quantization Quantization support in PyTorch open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected output size for Maxpool

6 participants