-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Debug test failure for separate I, W execution #138863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138863
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 6 New FailuresAs of commit 5d5f6b5 with merge base failed to retrieve merge base, please contact dev infra: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Ran into issues (#138863) when adding a Schedule with single stage for zero bubble, adding code to support this mostly for test purposes cc awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Ran into issues (#138863) when adding a Schedule with single stage for zero bubble, adding code to support this mostly for test purposes cc awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Ran into issues (#138863) when adding a Schedule with a single stage, so adding code to support this edge case (mostly for test purposes) cc awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Ran into issues (#138863) when adding a Schedule with a single stage, so adding code to support this edge case (mostly for test purposes) cc awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Ran into issues (#138863) when adding a Schedule with a single stage, so adding code to support this edge case (mostly for test purposes) Pull Request resolved: #138925 Approved by: https://github.com/wconstab
Ran into issues (pytorch#138863) when adding a Schedule with a single stage, so adding code to support this edge case (mostly for test purposes) Pull Request resolved: pytorch#138925 Approved by: https://github.com/wconstab
Test added in prev PR runs a simple schedule on one rank and using one pp stage. The schedule has 2 microbatches and runs F, I, W ops separately for each microbatch. Then the test compares .grad for the pipelined module to a reference module. The test fails becuase the pipelined module has 'None' grads. The debug logs show that the executor is running the I and W steps, and additional print statements show that the underlying backward_ functions are being called as expected. I identified that the 'param_groups' saved during backward_input were empty-list, and I think that is why there are no gradients computed during backward. This contradicts one comment in the code. There is a special case for stage 0 to execute 'full' backward instead of 'weight' backward when running 'backward_weight_one_chunk. I tried running both the if and else branch of this logic with the same result, 'None' grads. The special case seems to fit my case- I only have 1 stage, so it is stage 0. But running 'full' backward for dW does not seem to help. repro: `TORCH_LOGS=+pp python test/distributed/pipelining/test_schedule.py -k test_grad_with_split_b_w` ghstack-source-id: ef0fc80 Pull Request resolved: pytorch/pytorch#138863
Stack from ghstack (oldest at bottom):
Test added in prev PR runs a simple schedule on one rank and using one
pp stage. The schedule has 2 microbatches and runs F, I, W ops
separately for each microbatch. Then the test compares .grad for the
pipelined module to a reference module.
The test fails becuase the pipelined module has 'None' grads. The debug
logs show that the executor is running the I and W steps, and additional
print statements show that the underlying backward_ functions are being
called as expected.
I identified that the 'param_groups' saved during backward_input were
empty-list, and I think that is why there are no gradients computed
during backward. This contradicts one comment in the code. There is a
special case for stage 0 to execute 'full' backward instead of 'weight'
backward when running 'backward_weight_one_chunk. I tried running both
the if and else branch of this logic with the same result, 'None' grads.
The special case seems to fit my case- I only have 1 stage, so it is
stage 0. But running 'full' backward for dW does not seem to help.
repro:
TORCH_LOGS=+pp python test/distributed/pipelining/test_schedule.py -k test_grad_with_split_b_w