-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
high prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generalRelated to torch.autograd, and the autograd engine in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
Backward passes generate nonzero gradients for the embedding vector at the padding_idx index. Only occurs on the GPU, and didn't occur as of version 1.0.1.post2.
To Reproduce
import torch
dev = torch.device('cuda')
torch.manual_seed(0)
random_ix = torch.randint(high=10, size=(256, 3, 7))
embedding_layer = torch.nn.Embedding(10, 5, padding_idx=0)
embedding_layer.to(dev)
random_ix = random_ix.to(dev)
embeds = embedding_layer(random_ix)
merged = torch.sum(embeds, dim=2)
summed = merged.sum()
summed.backward()
print(embedding_layer.weight.grad[0])
which outputs: tensor([505., 505., 505., 505., 505.], device='cuda:0')
Expected behavior
tensor([0., 0., 0., 0., 0.], device='cuda:0')
Running on the CPU provides the expected behavior:
torch.manual_seed(0)
random_ix = torch.randint(high=10, size=(256, 3, 7))
embedding_layer = torch.nn.Embedding(10, 5, padding_idx=0)
embeds = embedding_layer(random_ix)
merged = torch.sum(embeds, dim=2)
summed = merged.sum()
summed.backward()
print(embedding_layer.weight.grad[0])
Environment
- PyTorch Version: 1.2
- OS: Ubuntu 18.04.2 LTS
- How you installed PyTorch: pipenv
- Python version: 3.6.7
- CUDA/cuDNN version: 10.0
- GPU models and configuration: GeForce RTX 2080 Ti
BrendonPierson, EthanSteinberg and zhpmatrix
Metadata
Metadata
Assignees
Labels
high prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generalRelated to torch.autograd, and the autograd engine in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module