-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Inductor] fix a compilation time regression caused by user-visible output handling #139420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…utput handling [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139420
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 906c09a with merge base 73fde0d ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aww yeahhh, that's what I'm talking about!
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…utput handling (pytorch#139420) This PR fixes a compilation time regression manifested in timm_models/hrnet_w18 caused by pytorch#136732. The regression is reproducible locally. The compilation time is a bit noisy, but it's still possible to tell the difference. ``` Before the offending PR compilation_latency mean=176.022 seconds compilation_latency mean=176.564 seconds On the offending PR compilation_latency mean=180.096 seconds compilation_latency mean=179.101 seconds On the fix compilation_latency mean=173.153 seconds compilation_latency mean=174.182 seconds ``` (I think the fix being faster than the baseline is due to noise) The cause of the regression is an inefficiency in `is_user_visible_output()`. Specifically, it used `output_node.args[0].index(node)` to obtain the output idx for each node (and we called this for each node twice). The offending PR had the assumption that `len(output_node.args[0])` is rather small. However, it has been proven false by the benchmark (it was 1900+ for timm_models/hrnet_w18). The fix is to precompute `user_visible_output_strides` once by iterating only over the nodes in `output_node.args[0]`. Pull Request resolved: pytorch#139420 Approved by: https://github.com/ezyang
Stack from ghstack (oldest at bottom):
This PR fixes a compilation time regression manifested in timm_models/hrnet_w18 caused by #136732.
The regression is reproducible locally. The compilation time is a bit noisy, but it's still possible to tell the difference.
(I think the fix being faster than the baseline is due to noise)
The cause of the regression is an inefficiency in
is_user_visible_output(). Specifically, it usedoutput_node.args[0].index(node)to obtain the output idx for each node (and we called this for each node twice). The offending PR had the assumption thatlen(output_node.args[0])is rather small. However, it has been proven false by the benchmark (it was 1900+ for timm_models/hrnet_w18).The fix is to precompute
user_visible_output_stridesonce by iterating only over the nodes inoutput_node.args[0].cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov