Quantile is limited to 16 million elements and have poor performance.

## 🐛 Bug

x = torch.randn((17_000_000))
q = x.quantile(torch.tensor([0.1,0.5]))

Will throw that the tensor is too large:   RuntimeError: quantile() input tensor is too large.

In addition to this the performance is really bad. Looking at the c++ here at Git shows a sort that returns the indexes is used which consumes a lot of memory. And the speed a very slow. I computed some quantiles of a 16 million element tensor and it took 2.3 s. The equivalent operation in numpy took 0.2 s. 

## To Reproduce

See above.

## Expected behavior

That I can compute the quantile of very large tensors, that it requires much less memory than now and that it is about 10 times faster.

## Environment

PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19042-SP0
Is CUDA available: True
CUDA runtime version: 11.4.100
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 471.41
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.9.0+cu111
[pip3] torchaudio==0.9.0
[pip3] torchvision==0.10.0+cu111
[conda] Could not collect



cc @msaroufim @jerryzh168 @mruberry @rgommers @VitalyFedyunin @ngimel @heitorschueroff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantile is limited to 16 million elements and have poor performance. #64947

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantile is limited to 16 million elements and have poor performance. #64947

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions