-
Notifications
You must be signed in to change notification settings - Fork 26.3k
fix(dynamo): Optimizer._init_group did not handle return value
#110709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dynamo): Optimizer._init_group did not handle return value
#110709
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110709
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7a93eb6 with merge base 3db0095 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Optimizer._init_group did not handle valueOptimizer._init_group did not handle return value
|
needs test |
|
Outdated DetailsAdded test but unable to prove whether guards are necessary.I think they are - I will add some ret value control flow and prove that no recompilation takes place resulting in incorrect result. For our purposes though, actually ret value should never change given the parameters are only updated in place. The same cannot be said of functional API but we do not prioritize this for compilation path. I'll try to analyze the exact guards involved in Optimizer compilation (e.g. parameter properties like is_sparse and is_complex. We may assert that Tensor metas are the same in the args source guards). |
|
Btw @mlazos , do note that this line is very strict: pytorch/torch/_dynamo/variables/optimizer.py Line 128 in 882bc17
Currently, some Do you agree @janeyx99. I think you informed me about this omission in #111008 (comment), but I didn't realize how bad it might be in terms of perf. Btw, what if we run out of cache size? Will we fallback on eager? That's even worse... |
Agreed that we should do this sooner rather than later, though thankfully the optimizers where this perf hit is significant are not usually in the hotpath where perf is critical. I'm about to go on a trip for 3+ weeks where I won't be able to work, so I've been busy tying up some loose ends in preparation for that. In that time, if anyone decides to take up the task, @mikaylagawarecki would be great as a reviewer! |
|
Hello @mlazos @mikaylagawarecki @janeyx99 I've improved the tests and split out the incremental refactor work into tracking issue #111331 Hope we can get this merged to unblock correctness issue on main. (main will currently not convert complex dtypes due to the bug this PR fixes) |
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK you know what, whatever, this is incrementally better, even though the titanic is sinking
|
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot label "topic: not user facing" |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…orch#110709) blocks: pytorch#110706 Causes a bug for all optimizers that use _init_group return value. `compile` + _init_group ret value is not on testing path. So we also add test. Pull Request resolved: pytorch#110709 Approved by: https://github.com/ezyang
…orch#110709) blocks: pytorch#110706 Causes a bug for all optimizers that use _init_group return value. `compile` + _init_group ret value is not on testing path. So we also add test. Pull Request resolved: pytorch#110709 Approved by: https://github.com/ezyang
blocks: #110706
Causes a bug for all optimizers that use _init_group return value.
compile+ _init_group ret value is not on testing path. So we also add test.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @kadeng @lezcano @mlazos