-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[dynamo][guards] Skip no tensor aliasing guards on parameters #138954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is another unsound guard eval optimization. Its rare in practice to compile a function with two different parameters as inputs, and then later call the function with the same input as two different inputs (aliasing). This further reduces guard overhead from 280 us to 240 us for the model in #138386 [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138954
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit b399150 with merge base f9ae3fa ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ers" This is another unsound guard eval optimization. Its rare in practice to compile a function with two different parameters as inputs, and then later call the function with the same input as two different inputs (aliasing). This further reduces guard overhead from 280 us to 240 us for the model in #138386 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
…ers" This is another unsound guard eval optimization. Its rare in practice to compile a function with two different parameters as inputs, and then later call the function with the same input as two different inputs (aliasing). This further reduces guard overhead from 280 us to 240 us for the model in #138386 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
|
I am sympathetic to this but it doesn't seem like it should be taking 40us to test this in the first place |
torch/_dynamo/guards.py
Outdated
| else: | ||
| self.tensor_check_examples.append(value) | ||
| self.tensor_check_names.append(tensor_name) | ||
| self.tensor_check_guards.append(guard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a refactor going on here that makes it hard to see the substantive change that happened here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can pull the refactor in a separate PR.
torch/_dynamo/guards.py
Outdated
| add_code_part(code, gcl.guard, True) | ||
| seen.add(code) | ||
|
|
||
| tensor_check_names = builder.tensor_check_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the code in torch/csrc/dynamo/guards.cpp referencing tensor_check_names have to be adjusted?
Also, since this happened before, can you explicitly make sure our verbose reporting (e.g., in logs and tlparse) did not regress with these changes? (The switch to cpp guard manager caused us to lose all tlparse guard output)
The quantized model in question has a really large number of parameters, and therefore we end up calling this guard for every parameter. There is not much going on in the guard itself. Its just the sheer quantity of them. |
…ers" This is another unsound guard eval optimization. Its rare in practice to compile a function with two different parameters as inputs, and then later call the function with one parameter input as two different inputs (aliasing). This further reduces guard overhead from 280 us to 240 us for the model in #138386 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
…ers" This is another unsound guard eval optimization. Its rare in practice to compile a function with two different parameters as inputs, and then later call the function with one parameter input as two different inputs (aliasing). This further reduces guard overhead from 280 us to 240 us for the model in #138386 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in #138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction). Pull Request resolved: #138942 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #139040, #138954
…h#138954) This is another unsound guard eval optimization. Its rare in practice to compile a function with two different parameters as inputs, and then later call the function with one parameter input as two different inputs (aliasing). This further reduces guard overhead from 280 us to 240 us for the model in pytorch#138386 Pull Request resolved: pytorch#138954 Approved by: https://github.com/jansel ghstack dependencies: pytorch#139040
This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in pytorch#138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction). Pull Request resolved: pytorch#138942 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: pytorch#139040, pytorch#138954
…h#138954) This is another unsound guard eval optimization. Its rare in practice to compile a function with two different parameters as inputs, and then later call the function with one parameter input as two different inputs (aliasing). This further reduces guard overhead from 280 us to 240 us for the model in pytorch#138386 Pull Request resolved: pytorch#138954 Approved by: https://github.com/jansel ghstack dependencies: pytorch#139040
This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in pytorch#138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction). Pull Request resolved: pytorch#138942 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: pytorch#139040, pytorch#138954
Stack from ghstack (oldest at bottom):
This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with one parameter input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in #138386
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec