[numpy] torch.{all/any} : output dtype is always bool#47878
[numpy] torch.{all/any} : output dtype is always bool#47878kshitij12345 wants to merge 45 commits intopytorch:masterfrom
Conversation
💊 CI failures summary and remediationsAs of commit b145649 (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. This comment has been revised 106 times. |
aten/src/ATen/native/ReduceOps.cpp
Outdated
| } else { | ||
| auto iter = make_reduction( | ||
| "all", result, self, dim, keepdim, self.scalar_type()); | ||
| "all", result, self, dim, keepdim, /*out_dtype=*/kBool); |
There was a problem hiding this comment.
this invocation of make_reduction will explicitly cast input to bool type, I doubt that's what you want.
There was a problem hiding this comment.
I think we need that for CPU dynamic casting is not supported,
pytorch/aten/src/ATen/native/cpu/Reduce.h
Lines 247 to 254 in 0652d75
And the inferred type of the data, is based on the return type of op passed to binary_kernel_reduce_vec,
pytorch/aten/src/ATen/native/cpu/Reduce.h
Lines 35 to 36 in 0652d75
pytorch/aten/src/ATen/native/cpu/Reduce.h
Lines 13 to 17 in 0652d75
Let me know if I am missing something 🤔
Thanks!
There was a problem hiding this comment.
It should be possible to not always cast the inputs to bool. I think there are two cases:
- both inputs have the same scalar type
- the inputs have different scalar types
For the first case the conversion to bool can happen in the kernel.
For the second case the inputs do need to be copied to avoid excessive template instantiations. In this case the inputs might as well be cast to bool ahead of the kernel being called.
The first case, where the scalar types are the same, is probably much more common than different scalar types, and would be nice to optimize for. Would it be possible to address that case?
There was a problem hiding this comment.
The first case, where the scalar types are the same, is probably much more common than different scalar types, and would be nice to optimize for. Would it be possible to address that case?
This is already taken care of,
pytorch/aten/src/ATen/native/ReduceOpsUtils.h
Lines 173 to 194 in 72918e4
At the bottom there is
pytorch/aten/src/ATen/native/ReduceOpsUtils.h
Lines 190 to 192 in 72918e4
|
Benchmark Benchmark Script import torch
import itertools
import time
from torch.utils.benchmark import Timer
from torch.utils.benchmark import Compare
import sys
print('Using pytorch %s' % (torch.__version__))
shapes = [(32,), (32, 32), (2, 16, 32), (2, 16, 32, 32), (8, 16, 64, 64)]
num_threads = 8
dtypes = [torch.bool, torch.float]
repeats = 10
for dtype in dtypes:
results = []
for mat1_shape in shapes:
mat1 = torch.randn(*mat1_shape, device='cpu').to(dtype)
mat1_cuda = mat1.to('cuda')
tasks = [("torch.all(mat1)", "torch.all CPU"),
# In CUDA, this path is taken only when training is False.
("torch.all(mat1_cuda)", "torch.all CUDA")]
timers = [Timer(stmt=stmt, num_threads=num_threads, label=f"All {dtype}", sub_label=f"{(mat1_shape)}", description=label, globals=globals()) for stmt, label in tasks]
for i, timer in enumerate(timers * repeats):
results.append(
timer.blocked_autorange()
)
print(f"\r{i + 1} / {len(timers) * repeats}", end="")
sys.stdout.flush()
comparison = Compare(results)
comparison.print()Before PR (dtype is same as Input and hence no casting) After PR Performance of CUDA has taken a hit. I think for CUDA we can avoid the casting. EDIT |
Codecov Report
@@ Coverage Diff @@
## master #47878 +/- ##
===========================================
+ Coverage 34.74% 80.92% +46.17%
===========================================
Files 460 1855 +1395
Lines 58048 200200 +142152
===========================================
+ Hits 20170 162014 +141844
- Misses 37878 38186 +308 |
|
@ngimel PTAL |
|
Please update the docs, currently docs say that any/all are unique to BoolTensor. |
heitorschueroff
left a comment
There was a problem hiding this comment.
I made some comments on the docs. For the testing, are you testing for discontiguous/strided input tensors?
* update docs * update test for non-contiguous tensor
|
@mruberry Have made the necessary changes to preserve dtype when input dtype is |
aten/src/ATen/native/ReduceOps.cpp
Outdated
| TORCH_CHECK(self.layout() == Layout::Strided, | ||
| "any only supports strided layout, got: ", self.layout()); | ||
| // Refer [all, any : uint8 compatibility] | ||
| TORCH_CHECK(result.scalar_type() == ScalarType::Bool || result.scalar_type() == ScalarType::Byte, |
There was a problem hiding this comment.
This needs a slight update.
I suggest separating these lines into two TORCH_CHECKs. The first one checks if result's dtype is bool OR self's dtype is byte. If that check fails then the error message should report that any "expected" a boolean 'out' tensor but got ...
The second TORCH_CHECK checks that result's dtype is byte or self's dtype is not byte. If that check fails then the error message should report that any "expected" an int8 'out' tensor but got ...
These conditionals can be phrased slightly differently if you prefer.
| torch.any(x, dim, out=out) | ||
| self.assertEqual(expected, out) | ||
| else: | ||
| with self.assertRaisesRegex(RuntimeError, "all only supports bool tensor for result, got"): |
There was a problem hiding this comment.
These error messages will need to be updated, too
test/test_reductions.py
Outdated
| self.compare_with_numpy(torch_fn, np_fn, x, exact_dtype=exact_dtype) | ||
|
|
||
| def _test_out_variant(x, dim): | ||
| out = torch.empty_like(x) |
There was a problem hiding this comment.
The actual output will actually be a scalar tensor, right? Let's make out a scalar tensor here, too. Otherwise when we start throwing runtime errors on resizing out= tensors we'll have to update this.
mruberry
left a comment
There was a problem hiding this comment.
Nice updates, @kshitij12345! Two small comments inline.
I'm going to import this now to run some additional BC-compat tests on it.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Is it ok if I push the new changes or will it stop the internal BC compat tests? |
Go ahead. It won't affect them. |
|
Update: this is passing our internal BC-compat tests for existing models, which is great news. Let's make those few last tweaks and land this. |
|
@mruberry If I understand correctly, the new behavior is
If that is the case I will work on the pt/xla pr |
Hey @JackCaoG! Was just about to ping you now that this is looking OK. It's just:
No rush. Obviously we'll wait for the pt/xla PR to be ready. Sorry that we had to thrash on this for BC concerns. |
Summary: BC-breaking note: This PR changes the behavior of the any and all functions to always return a bool tensor. Previously these functions were only defined on bool and uint8 tensors, and when called on uint8 tensors they would also return a uint8 tensor. (When called on a bool tensor they would return a bool tensor.) PR summary: pytorch#44790 (comment) Fixes 2 and 3 Also Fixes pytorch#48352 Changes * Output dtype is always `bool` (consistent with numpy) **BC Breaking (Previously used to match the input dtype**) * Uses vectorized version for all dtypes on CPU * Enables test for complex * Update doc for `torch.all` and `torch.any` TODO * [x] Update docs * [x] Benchmark * [x] Raise issue on XLA Pull Request resolved: pytorch#47878 Reviewed By: H-Huang Differential Revision: D25421263 Pulled By: mruberry fbshipit-source-id: c6c681ef94004d2bcc787be61a72aa059b333e69
|
Hi @mruberry XLA pr is ready, I will merge it after this one is merged |
|
Good morning, @JackCaoG. Final tests are running. Expected merge time is around 11AM PST today. I'll keep you updated. |
|
@mruberry It seems like pytorch pr has been merged? I will go ahead and merge the xla pr |
|
@JackCaoG Yep, sorry was in a meeting. Merged. |
Summary: BC-breaking note: This PR changes the behavior of the any and all functions to always return a bool tensor. Previously these functions were only defined on bool and uint8 tensors, and when called on uint8 tensors they would also return a uint8 tensor. (When called on a bool tensor they would return a bool tensor.) PR summary: pytorch#44790 (comment) Fixes 2 and 3 Also Fixes pytorch#48352 Changes * Output dtype is always `bool` (consistent with numpy) **BC Breaking (Previously used to match the input dtype**) * Uses vectorized version for all dtypes on CPU * Enables test for complex * Update doc for `torch.all` and `torch.any` TODO * [x] Update docs * [x] Benchmark * [x] Raise issue on XLA Pull Request resolved: pytorch#47878 Reviewed By: albanD Differential Revision: D25714324 Pulled By: mruberry fbshipit-source-id: a87345f725297524242d69402dfe53060521ea5d
PR summary:
#44790 (comment)
Fixes 2 and 3
Also Fixes #48352
Changes
bool(consistent with numpy) (except for uint8, where it's uint8)torch.allandtorch.anyTODO