-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Bug
There is an issue with how blocks/buffers are being allocated on CUDA. This leads to zeroed gradients in the example below for those matrices that are beyond the 65535th entry.
This only seems to affect the nightly build.
To Reproduce
In the example below the assert fails because the last entry is all zeros instead of the identity matrix. Decrement n by 1 and it works as expected.
import torch
n = (1 << 16) # - 1
x = torch.rand(n, 2, 2, device='cuda').requires_grad_()
a = x[:, [0], [0]]
b = x[:, [1], [1]]
s = (a + b).sum()
s.backward()
assert torch.allclose(x.grad, torch.eye(2, out=x.new(2, 2))) # failsIt's probably related to the following which works on the nightly build for n up to 2^19 - 7 while the limit on the lastest release is 2^16:
n = (1 << 19) - 7 # - 1
x = torch.rand(n, 2, 2, device='cuda')
x = (x @ x.transpose(-2, -1)).add_(torch.eye(2, out=x.new(2, 2)))
y = x.detach().requires_grad_()
l = y.cholesky() # <-- fails here
logdet = 2 * l.diagonal(dim1=1, dim2=2).log().sum()
logdet.backward()
assert torch.allclose(y.grad, y.inverse())Note that here, other than the change in the limit, I get an error when the threshold is passed: RuntimeError: CUDA error: invalid configuration argument. (Which becomes clear after one pastes it in their favorite search engine.) So I would expect something like this for the first code snippet too.
Environment
Last nightly build with CUDA 10 and cuDNN 7.5.