-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Default XLA to use swap_tensors path in nn.Module._apply #126814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default XLA to use swap_tensors path in nn.Module._apply #126814
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126814
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 3 Unrelated FailuresAs of commit 5aee159 with merge base 5196ef1 ( NEW FAILURES - The following jobs have failed:
UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Thanks @mikaylagawarecki |
| def test_conv_empty_input(self, device, dtype): | ||
| def help(input, conv, memory_format): | ||
| ref_out = conv(input) | ||
| ref_out = conv(input).detach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These .detach() calls are to ensure the autograd graph is not alive during .to() (otherwise the refcount of param will be more than 1 due to the AccumulateGrad node holding a reference) and prevent swap_tensors from being used.
As discussed, this was a known limitation of the swap_tensors path. In this case it seems more like an artifact of how the test was written, but seems unlikely to occur in practice (you don't normally want to change the dtype/device of your model while the autograd graph is alive).
@JackCaoG wanted to double check that you are okay with this limitation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I agree that in real life it is unlikely to happen
…y and .to('meta')) (#126819)
Pull Request resolved: #126819
Approved by: https://github.com/albanD
ghstack dependencies: #126814
|
Sorry my bad, upstream runs a subset of the full XLA test. I started to see the CI failure on our end with It is ok to revert this or while I debug this issue? Thanks! |
|
@pytorchbot revert -m "broke xla ci" -c nosignal |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@izaitsevfb I no longer see the failure in D58015016 in the new import D58094197, is this okay to re-merge? |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-13-py3-arm64 / build Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -r |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_8-cuda11_8-test / test, trunk / macos-13-py3-arm64 / build Details for Dev Infra teamRaised by workflow job |
|
linux-binary-manywheel / manywheel-py3_8-cuda11_8-test / test (gh) failures are unrelated |
|
@pytorchbot merge -f "failures unrelated, see above comment" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
) Pull Request resolved: pytorch#126814 Approved by: https://github.com/JackCaoG, https://github.com/albanD ghstack dependencies: pytorch#127313
…y and .to('meta')) (pytorch#126819)
Pull Request resolved: pytorch#126819
Approved by: https://github.com/albanD
ghstack dependencies: pytorch#127313, pytorch#126814
….to_empty and .to('meta')) (pytorch#126819)"
This reverts commit fa426b0.
Reverted pytorch#126819 on behalf of https://github.com/izaitsevfb due to suspicious build instructions count regression, see [D58015016](https://www.internalfb.com/diff/D58015016) ([comment](pytorch#126814 (comment)))
…orch#126814)" This reverts commit bfdec93. Reverted pytorch#126814 on behalf of https://github.com/izaitsevfb due to suspicious build instructions count regression, see [D58015016](https://www.internalfb.com/diff/D58015016) ([comment](pytorch#126814 (comment)))
…6814)" (#128170) #128165 :( This reverts commit a7b1dd8. Pull Request resolved: #128170 Approved by: https://github.com/drisspg, https://github.com/albanD
…orch#126814)" (pytorch#128170) pytorch#128165 :( This reverts commit a7b1dd8. Pull Request resolved: pytorch#128170 Approved by: https://github.com/drisspg, https://github.com/albanD
Stack from ghstack (oldest at bottom):