Skip to content

Gradient for embedding vector at padding_idx is not zero when on GPU #26302

@azcohen14-zz

Description

@azcohen14-zz

🐛 Bug

Backward passes generate nonzero gradients for the embedding vector at the padding_idx index. Only occurs on the GPU, and didn't occur as of version 1.0.1.post2.

To Reproduce

import torch
dev = torch.device('cuda')
torch.manual_seed(0)

random_ix = torch.randint(high=10, size=(256, 3, 7))
embedding_layer = torch.nn.Embedding(10, 5, padding_idx=0)

embedding_layer.to(dev)
random_ix = random_ix.to(dev)

embeds = embedding_layer(random_ix)
merged = torch.sum(embeds, dim=2)
summed = merged.sum()
summed.backward()

print(embedding_layer.weight.grad[0])

which outputs: tensor([505., 505., 505., 505., 505.], device='cuda:0')

Expected behavior

tensor([0., 0., 0., 0., 0.], device='cuda:0')

Running on the CPU provides the expected behavior:

torch.manual_seed(0)

random_ix = torch.randint(high=10, size=(256, 3, 7))
embedding_layer = torch.nn.Embedding(10, 5, padding_idx=0)

embeds = embedding_layer(random_ix)
merged = torch.sum(embeds, dim=2)
summed = merged.sum()
summed.backward()

print(embedding_layer.weight.grad[0])

Environment

  • PyTorch Version: 1.2
  • OS: Ubuntu 18.04.2 LTS
  • How you installed PyTorch: pipenv
  • Python version: 3.6.7
  • CUDA/cuDNN version: 10.0
  • GPU models and configuration: GeForce RTX 2080 Ti

cc @ezyang @gchanan @zou3519 @ssnl @albanD @gqchen

Metadata

Metadata

Labels

high prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions