Skip to content

Batched Triu And Tril Incorrect for Some Inputs #22581

@angusturner

Description

@angusturner

🐛 Bug

When passing a specific 4D tensor to triu or tril, the GPU implementation produces non-deterministic results. The CPU implementation produces nan values.

To Reproduce

This is the minimum reproducible example I could come up with:

x = torch.randn(1, 4, 4, 4)
x = x.transpose(0, 1)
for i in range(10):
    # note: results are often different on each run
    # or on CPU, outputs `nan`
    print(x.triu().sum())

Note that the issue does not seem to occur when:

  • the transpose is omitted
  • if x.size(0) > 1
  • if the tensor has more or fewer dimensions

Expected behavior

Values should be deterministic and not produces NaN

Environment

PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: Could not collect

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB
Nvidia driver version: 418.67
cuDNN version: 7501

Versions of relevant libraries:
[pip] numpy==1.16.1
[pip] torch==1.1.0
[pip] torchfile==0.1.0
[conda] torch 1.1.0
[conda] torchfile 0.1.0

Metadata

Metadata

Assignees

Labels

high prioritytriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions