Skip to content

at::parallel_for does not propagate thread-local variables to child threads in embedding_renorm_ #28370

@albanD

Description

@albanD

See repro below:

import torch

def run(val):
    print("Running :", val)
    weights = torch.rand(100, 64, requires_grad=True)
    inp = torch.rand(val).mul(100).long()
    with torch.no_grad():
        torch.embedding_renorm_(weights, inp, 1.0, 2)
    print("This should be None: ", weights.grad_fn)

run(1000)
run(1001)

Outputs:

Running : 1000
This should be None:  None
Running : 1001
This should be None:  <CopySlices object at 0x7f02faa5e7a8>

Current supposition is that multi-threaded operations and thread local NoGradGuard could be problematic.

cc @ezyang @gchanan @zou3519 @jerryzh168 @ssnl @albanD @gqchen @VitalyFedyunin @ngimel @mruberry

Metadata

Metadata

Assignees

Labels

high prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generalmodule: internalsRelated to internal abstractions in c10 and ATenmodule: multithreadingRelated to issues that occur when running on multiple CPU threadsmodule: performanceIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions