Skip to content

Zero gradients beyond a certain buffer size on CUDA #22843

@calincru

Description

@calincru

🐛 Bug

There is an issue with how blocks/buffers are being allocated on CUDA. This leads to zeroed gradients in the example below for those matrices that are beyond the 65535th entry.

This only seems to affect the nightly build.

To Reproduce

In the example below the assert fails because the last entry is all zeros instead of the identity matrix. Decrement n by 1 and it works as expected.

import torch
n = (1 << 16) # - 1
x = torch.rand(n, 2, 2, device='cuda').requires_grad_()
a = x[:, [0], [0]]
b = x[:, [1], [1]]
s = (a + b).sum()
s.backward()
assert torch.allclose(x.grad, torch.eye(2, out=x.new(2, 2)))  # fails

It's probably related to the following which works on the nightly build for n up to 2^19 - 7 while the limit on the lastest release is 2^16:

n = (1 << 19) - 7 # - 1
x = torch.rand(n, 2, 2, device='cuda')
x = (x @ x.transpose(-2, -1)).add_(torch.eye(2, out=x.new(2, 2)))
y = x.detach().requires_grad_()
l = y.cholesky()  # <-- fails here
logdet = 2 * l.diagonal(dim1=1, dim2=2).log().sum()
logdet.backward()
assert torch.allclose(y.grad, y.inverse())

Note that here, other than the change in the limit, I get an error when the threshold is passed: RuntimeError: CUDA error: invalid configuration argument. (Which becomes clear after one pastes it in their favorite search engine.) So I would expect something like this for the first code snippet too.

Environment

Last nightly build with CUDA 10 and cuDNN 7.5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    high prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generalmodule: buildBuild system issuesmodule: cudaRelated to torch.cuda, and CUDA support in generaltriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions