-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
This issue is for tracking and organizing tasks related to reduction operators. For 1.10 we are focusing on aligning API with the Python Array API Standard and test improvements.
Note that some of the tasks below need further discussion before we decide to commit to them.
API
Reduction operators should implement the Python Array API Standard which includes supporting all data types (unless noted in the documentation otherwise) and performing type promotion. They should be compatible with NumPy where possible (giving higher priority to the Python Array API Standard). Where applicable, they should support reducing over multiple dimensions.
Python Array API Standard
- Add a new schema type for representing dimension lists #44409
- Some reduction operators have double signatures #61486
- Deprecate torch.(min|max|median|mode) to only return values and not indices #61490
- Many reduction operators do not support reducing over multiple dimensions #61582
- Replace unbiased parameter in torch.(std|var|std_mean|var_mean) with correction=0 #61492
- torch.any and torch.all map uint8 -> uint8 but should map uint8 -> bool #50342
- Support bool input tensors for argmax / argmin / sort / topk and other functions #35529
NumPy compatibility
- torch.var and torch.std are not compatible with np.var and np.std #50010
- torch.sum(tensor, dim=()) is different from np.sum(arr, axis=()) #29137
- Make operators like logsumexp and cumsum operate over dimension 0 by default (or at least for 1D arrays) #19149
Scalar and empty tensors
- torch.sum(tensor(2.), dim=0) (and probably other reduction functions) doesn't make sense #46999
- Enable applying min, max, argmin, argmax over nonzero dims of tensor containing dims of size 0 #28380
Variants
- tensor.var_mean variant for existing torch.var_mean (and same for std_mean) #24041
- [feature request] Out-variant and dtype argument for torch.argmax / torch.argmin / torch.argsort (and friends) #32570
Type Promotion
- Support torch.mean for BoolTensors and other integer tensor inputs (without manual upcasting and hopefully without hidden upcasting) #45833
- Unexpected error when passing integer tensor to logsumexp #56132
Testing
Reduction operators have additional structure than other operators such as dim and keepdim parameters which can be exploited to automate testing of many features. To do so, we can create a ReductionInfo subclass of OpInfo and update test_reductions.py to follow the OpInfo pattern. Some example of tests that can be automated are ensuring reduction operators support reducing over multiple dimensions, reducing over nonzero dimensions of empty tensors, and even testing for correctness by providing a reference implementation such as a NumPy equivalent operator.
- Write a ReductionInfo subclass of OpInfo #49746
- Test TensorIterator reductions that require 64-bit indexing #59550
-
TestReductionsCPU.test_nansum_out_dtype_cpuis flaky, but its flakiness is currently hidden #59415 -
test_reductionsignoring some tests #53704
Bugs
- out variant of many loss functions are not consistent with non-out variant when reduction is not none #50382
- torch.mode when input has nans #46225
- torch.mean(x, dims=[]) has incorrect gradient in 1.2 #28993
- Can't
torch.sum(tensor, dim)wheredim >= 64#23159 - F.nll_loss with 16-bit CUDA tensors and reduction=mean produces NaNs #61523
-
torch.medianon empty tensor causes segfault #61656 - "RuntimeError: CUDA error: invalid configuration argument" when operating on some GPU tensors #48573
- aminmax increases dimension from 0-dim to 1-dim when given scalar inputs and keepdim. #64008
Performance
- max-sum operation #59545
- Concurrent minmax reduction operator #62164
- Massive Performance bottlenecks in some of the Reduce operations. #51509
- torch.median slower than torch.sort on cpu #51450
- Generalized CPU vector reductions #40570
- logsumexp: two little-impact perf suggestions (important because logsumexp is used for optimizing/fusing cross_entropy over large vocabs) #31837
- [ATen] mean operator is unvectorized on CPU #16617
- [perf] 10x improvement when doing
x.sum(-1)manually #57610 - bool_tensor.sum(dtype=torch.int32) creates int32-copy of the original int8 tensor #55366
- torch median / nanmedian w/ nans speed #63870
-
min/maxrequire a huge allocation on GPU #63869
Feature Requests
- Implement missing torch.nan* operators #61474
- Dimension reducing variants of bitwise operations (bitwise_or, bitwise_and, bitwise_xor) #35641
- torch batchwise max with indices #32998
- logsumexp with subtraction #32097
- [FR] torch.dist along a dimension #18904
Allow specifying a range for dimensions to reduce over
- Add
start_dimandend_dimfunctionality for common reduction operations. #54766 - Allow range in dim argument of reducing operations such as sum #32288
Miscellaneous
- Change make_reduction to reflect input resizing. #56764
- General reduction mode selection for in-place and out-variants for wider range (hopefully all) of ops #41793
- [doc] Tensor.mean: dtype kwarg is not documented #29758
- torch.Tensor.mean erroneously documented as sometimes returning a tuple #27312
- mean/sum(dtype) arg matching gives bad error message with positional dtype arg #25775