Skip to content

Conversation

@anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Oct 25, 2024

Stack from ghstack (oldest at bottom):

This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with one parameter input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in #138386

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with the same input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in #138386

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138954

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit b399150 with merge base f9ae3fa (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ers"

This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with the same input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in #138386

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
…ers"

This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with the same input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in #138386

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Oct 28, 2024

I am sympathetic to this but it doesn't seem like it should be taking 40us to test this in the first place

else:
self.tensor_check_examples.append(value)
self.tensor_check_names.append(tensor_name)
self.tensor_check_guards.append(guard)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a refactor going on here that makes it hard to see the substantive change that happened here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can pull the refactor in a separate PR.

add_code_part(code, gcl.guard, True)
seen.add(code)

tensor_check_names = builder.tensor_check_names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the code in torch/csrc/dynamo/guards.cpp referencing tensor_check_names have to be adjusted?

Also, since this happened before, can you explicitly make sure our verbose reporting (e.g., in logs and tlparse) did not regress with these changes? (The switch to cpp guard manager caused us to lose all tlparse guard output)

@anijain2305
Copy link
Contributor Author

anijain2305 commented Oct 28, 2024

I am sympathetic to this but it doesn't seem like it should be taking 40us to test this in the first place

The quantized model in question has a really large number of parameters, and therefore we end up calling this guard for every parameter. There is not much going on in the guard itself. Its just the sheer quantity of them.

…ers"


This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with one parameter input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in #138386

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
…ers"


This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with one parameter input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in #138386

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec

[ghstack-poisoned]
@anijain2305 anijain2305 requested a review from ezyang October 28, 2024 17:35
pytorchmergebot pushed a commit that referenced this pull request Oct 29, 2024
This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in #138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction).

Pull Request resolved: #138942
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #139040, #138954
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Oct 29, 2024
…h#138954)

This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with one parameter input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in pytorch#138386

Pull Request resolved: pytorch#138954
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#139040
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Oct 29, 2024
This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in pytorch#138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction).

Pull Request resolved: pytorch#138942
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: pytorch#139040, pytorch#138954
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…h#138954)

This is another unsound guard eval optimization. Its rare in practice to
compile a function with two different parameters as inputs, and then
later call the function with one parameter input as two different inputs
(aliasing). This further reduces guard overhead from 280 us to 240 us
for the model in pytorch#138386

Pull Request resolved: pytorch#138954
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#139040
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in pytorch#138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction).

Pull Request resolved: pytorch#138942
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: pytorch#139040, pytorch#138954
@github-actions github-actions bot deleted the gh/anijain2305/562/head branch November 29, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants