Skip to content

[jit] Autodiff bug on double backward for early expiration of grad_accumulator  #19769

@wanchaol

Description

@wanchaol

🐛 Bug

Our autodiff infrastructure have bugs on double backward for early expiration of grad_accumulator for some formulas (i.e. layer_norm, linear).

It gives error like below

  File "test/test_jit.py", line 659, in checkTrace
    grads2_ge = torch.autograd.grad(l2_ge, flattened_recording_inputs, allow_unused=allow_unused)
  File "/scratch/wanchaol/local/pytorch/torch/autograd/__init__.py", line 149, in grad
    inputs, allow_unused)
RuntimeError: No grad accumulator for a saved leaf!

This is happening on master now, the bug is kinda complicated to trigger: when we add an non-trivial AD formula, sometimes non-trivial models will throw the above error, while in pure Autograd mode it runs fine.

To Reproduce

I noticed this problem appears in #20284 , after I added an AD formula for linear in #20039.

  1. git fetch origin; git checkout gh/wanchaol/5/origin
  2. python setup.py develop
  3. python test/test_jit.py TestEndToEndHybridFrontendModels.test_vae

Expected behavior

The test will test in autograd mode and in JIT Autodiff mode, the autograd mode works fine, but the JIT autodiff second derivative gives the above error, which I think it related to our autodiff infrastructure early release one of the leaf Variable

Metadata

Metadata

Assignees

Labels

high priorityoncall: jitAdd this issue/PR to JIT oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions