-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Bug
When passing a specific 4D tensor to triu or tril, the GPU implementation produces non-deterministic results. The CPU implementation produces nan values.
To Reproduce
This is the minimum reproducible example I could come up with:
x = torch.randn(1, 4, 4, 4)
x = x.transpose(0, 1)
for i in range(10):
# note: results are often different on each run
# or on CPU, outputs `nan`
print(x.triu().sum())Note that the issue does not seem to occur when:
- the transpose is omitted
- if
x.size(0) > 1 - if the tensor has more or fewer dimensions
Expected behavior
Values should be deterministic and not produces NaN
Environment
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: Could not collect
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB
Nvidia driver version: 418.67
cuDNN version: 7501
Versions of relevant libraries:
[pip] numpy==1.16.1
[pip] torch==1.1.0
[pip] torchfile==0.1.0
[conda] torch 1.1.0
[conda] torchfile 0.1.0