Skip to content

torch.nonzero slower than np.nonzero  #14848

@igormq

Description

@igormq

🐛 Bug

torch.nonzero is slower than np.nonzero.

Object detection libraries such as maskrcnn_benchmark heavily use this function in order to select the proposals, which might decrease inference time. Also, critical parts of Pytorch, such as indexing, rely on torch.nonzero.

To Reproduce

1D tensor of size 512

import numpy as np
import torch
data = np.random.randn(512)
t_data = torch.as_tensor(data)
ct_data = torch.as_tensor(data, device='cuda')

%timeit np.nonzero(data)
4.02 µs ± 54 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit torch.nonzero(t_data)
23.7 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit torch.nonzero(ct_data)                                                                                                                                                                                                        
31.6 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

ND tensor

import numpy as np
import torch
data = np.random.randn(16, 3, 512)
t_data = torch.as_tensor(data)
ct_data = torch.as_tensor(data, device='cuda')

%timeit np.nonzero(data)
270 µs ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit torch.nonzero(t_data)
3.09 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit torch.nonzero(ct_data)                                                                                                                                                                                                        
38.9 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Expected behavior

CPU implementation of torch.nonzero should have a similar performance than np.nonzero, while GPU implementation should be faster (at least for high dimensional tensors)

Environment

Collecting environment information...
PyTorch version: 1.0.0.dev20181024
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration: 
GPU 0: GeForce GTX 1080
GPU 1: GeForce GTX 1080
GPU 2: GeForce GTX 1080
GPU 3: GeForce GTX 1080

Nvidia driver version: 390.48
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21
/usr/lib/x86_64-linux-gnu/libcudnn_static.a
/usr/local/cuda-8.0/lib64/libcudnn.so
/usr/local/cuda-8.0/lib64/libcudnn.so.5
/usr/local/cuda-8.0/lib64/libcudnn.so.5.1.5
/usr/local/cuda-8.0/lib64/libcudnn.so.6
/usr/local/cuda-8.0/lib64/libcudnn.so.6.0.21
/usr/local/cuda-8.0/lib64/libcudnn_static.a
/usr/local/cuda-9.1/lib64/libcudnn.so
/usr/local/cuda-9.1/lib64/libcudnn.so.7
/usr/local/cuda-9.1/lib64/libcudnn.so.7.1.3
/usr/local/cuda-9.1/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip] msgpack-numpy (0.4.3.2)
[pip] numpy (1.15.4)
[pip] pytorch-ignite (0.1.0)
[pip] torch (1.0.0.dev20181024)
[pip] torchaudio (0.1)
[pip] torchtext (0.3.1)
[pip] torchvision (0.2.1)
[pip] torchvision-nightly (0.2.1)
[conda] pytorch                   0.4.1           py36_py35_py27__9.0.176_7.1.2_2    pytorch
[conda] pytorch-nightly           1.0.0.dev20181024 py3.6_cuda9.0.176_cudnn7.1.2_0    pytorch
[conda] torchaudio                0.1                       <pip>
[conda] torchtext                 0.3.1                     <pip>
[conda] torchvision               0.2.1                    py36_1    pytorch
[conda] torchvision-nightly       0.2.1                     <pip>

Metadata

Metadata

Assignees

Labels

module: bootcampWe plan to do a full writeup on the issue, and then get someone to do it for onboardingmodule: performanceIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions