[dynamo][guards] Skip guards on empty nn module hooks #138942

anijain2305 · 2024-10-25T21:37:52Z

Stack from ghstack (oldest at bottom):

This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in #138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction).

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

[ghstack-poisoned]

pytorch-bot · 2024-10-25T21:37:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138942

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 5ca3b52 with merge base f9ae3fa ():

NEW FAILURE - The following job has failed:

Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/fx/experimental/sym_node.py:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh) (trunk failure)
[ FAILED ] ListTestIValueBasedList.whenMoveConstructingList_thenOldIsUnchanged

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in #138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

test/dynamo/test_activation_checkpointing.py

This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in #138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: 2ff244b Pull Request resolved: #138942

test/dynamo/test_activation_checkpointing.py

ezyang · 2024-10-28T02:12:58Z

Are you going to allow controlling the unsoundness via config?

anijain2305 · 2024-10-28T02:42:49Z

Are you going to allow controlling the unsoundness via config?

Yes, skip_nnmodule_hook_guards controls this.

ezyang · 2024-10-28T03:40:46Z

I don't see any reference to it in this PR, is there some nontrivial interaction with earlier PRs in the stack?

anijain2305 · 2024-10-28T04:36:49Z

I don't see any reference to it in this PR, is there some nontrivial interaction with earlier PRs in the stack?

Its not obvious from this PR. The change in the PR causes empty nn module hooks for the user defined nn module to insert an EMPTY_NN_MODULE_HOOKS_DICT guard (see snippet), which relies on the skip_nnmodule_hook_guards flag to skip (see permalink).

pytorch/torch/_dynamo/guards.py

Lines 1681 to 1686 in d2052ea

    
           def EMPTY_NN_MODULE_HOOKS_DICT(self, guard): 
        
               """Special guard to skip guards on empty hooks. This is controlled by skip_nnmodule_hook_guards""" 
        
               if config.skip_nnmodule_hook_guards: 
        
                   # This is unsafe if you add/remove a hook on nn module variable 
        
                   return 
        
               self.SEQUENCE_LENGTH(guard)

For due diligence, let me also add a test.

This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in #138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction). cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

anijain2305 · 2024-10-29T02:09:48Z

@pytorchbot merge -f "unrelated CI failure"

pytorchmergebot · 2024-10-29T02:11:18Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in pytorch#138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction). Pull Request resolved: pytorch#138942 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: pytorch#139040, pytorch#138954

ezyang · 2024-11-01T00:15:19Z

This seems to result in a modest compile time improvement in HF inference for MobileBertForMaskedLM, probably because processing all the extra guards is also expensive.

Samples: https://fburl.com/scuba/torch_open_source_signpost/5snsgy3x

This brings some unsoundness in guards. Earlier we were skipping empty nn module hooks dict guard only on inbuilt nn modules, but as seen in pytorch#138386, there could be still be significant guard overhead. With this PR, we reduce the guard eval latency from 420 us to 280 us (1.5x reduction). Pull Request resolved: pytorch#138942 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: pytorch#139040, pytorch#138954

[dynamo][guards] Skip guards on empty nn module hooks

c73bede

[ghstack-poisoned]

This was referenced Oct 25, 2024

[dynamo][refactor][config-cleanp] Use guard_manager consistently instead of check_fn #138896

Closed

[dynamo][guards] Log average time of constructed guard_manager #138941

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels Oct 25, 2024

anijain2305 mentioned this pull request Oct 25, 2024

[dynamo][guards] Skip no tensor aliasing guards on parameters #138954

Closed

anijain2305 added 2 commits October 25, 2024 16:40

anijain2305 requested review from ezyang, jansel and yf225 October 26, 2024 04:17

anijain2305 commented Oct 26, 2024

View reviewed changes

test/dynamo/test_activation_checkpointing.py Outdated Show resolved Hide resolved

anijain2305 added a commit that referenced this pull request Oct 26, 2024

[dynamo][guards] Skip guards on empty nn module hooks

15bcc63

ghstack-source-id: 2ff244b Pull Request resolved: #138942

anijain2305 commented Oct 26, 2024

View reviewed changes

test/dynamo/test_activation_checkpointing.py Show resolved Hide resolved

anijain2305 added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Oct 26, 2024

anijain2305 mentioned this pull request Oct 27, 2024

[dynamo] Prevent Dynamo from triggering on an unwanted frame #139022

Closed

This comment was marked as resolved.

Sign in to view

anijain2305 mentioned this pull request Oct 28, 2024

[dynamo] "skip_guard_eval_unsafe" API for power users #139038

Closed

anijain2305 mentioned this pull request Oct 28, 2024

[dynamo][refactor] Remaining cleanup from config-cleanup of enable_cpp_guard_manager #139040

Closed

ezyang approved these changes Oct 28, 2024

View reviewed changes

jansel approved these changes Oct 29, 2024

View reviewed changes

pytorchmergebot added the merging label Oct 29, 2024

pytorchmergebot added the Merged label Oct 29, 2024

pytorchmergebot closed this in e80fe7f Oct 29, 2024

pytorchmergebot removed the merging label Oct 29, 2024

github-actions bot deleted the gh/anijain2305/561/head branch December 1, 2024 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[dynamo][guards] Skip guards on empty nn module hooks #138942

[dynamo][guards] Skip guards on empty nn module hooks #138942

Uh oh!

anijain2305 commented Oct 25, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 25, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ezyang commented Oct 28, 2024

Uh oh!

anijain2305 commented Oct 28, 2024

Uh oh!

ezyang commented Oct 28, 2024

Uh oh!

anijain2305 commented Oct 28, 2024 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

anijain2305 commented Oct 29, 2024

Uh oh!

pytorchmergebot commented Oct 29, 2024

Uh oh!

ezyang commented Nov 1, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[dynamo][guards] Skip guards on empty nn module hooks #138942

[dynamo][guards] Skip guards on empty nn module hooks #138942

Uh oh!

Conversation

anijain2305 commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138942

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

Uh oh!

Uh oh!

ezyang commented Oct 28, 2024

Uh oh!

anijain2305 commented Oct 28, 2024

Uh oh!

ezyang commented Oct 28, 2024

Uh oh!

anijain2305 commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

anijain2305 commented Oct 29, 2024

Uh oh!

pytorchmergebot commented Oct 29, 2024

Merge started

Uh oh!

ezyang commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

anijain2305 commented Oct 25, 2024 •

edited

Loading

pytorch-bot bot commented Oct 25, 2024 •

edited

Loading

anijain2305 commented Oct 28, 2024 •

edited

Loading

ezyang commented Nov 1, 2024 •

edited

Loading