Optimize dynamo guards for global hooks dicts by wconstab · Pull Request #96246 · pytorch/pytorch

wconstab · 2023-03-08T00:45:50Z

Stack from ghstack (oldest at bottom):

Use 'BOOL_FALSE' guard (which codegens to not value) in place of
'DICT_KEYS' guard (which codegens to call a slower type check function)
when guarding on global nnmodule dicts being empty.

Note: it's definitely not foolproof to use BOOL_FALSE and skip type check,
since runtime types could fail to execute bool(runtime_obj), e.g. if obj is
a tensor. We are opting hooks dicts into special treatment on the assumption
of well behaved users.

Note to reviewers: my previous attempt to use BOOL_FALSE only applied to per-nn.module hooks. I am guessing that this attempt will not yield significant gains since it only affects the 4 global dicts and not the N copies of per-module dicts, but it could be that the runtime of check_type_id is significant enough to matter even there.

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

Use 'BOOL_FALSE' guard (which codegens to `not value`) in place of 'DICT_KEYS' guard (which codegens to call a slower type check function) when guarding on global nnmodule dicts being empty. Note: it's definitely not foolproof to use BOOL_FALSE and skip type check, since runtime types could fail to execute bool(runtime_obj), e.g. if obj is a tensor. We are opting hooks dicts into special treatment on the assumption of well behaved users. [ghstack-poisoned]

pytorch-bot · 2023-03-08T00:45:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96246

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e6b51d6:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Use 'BOOL_FALSE' guard (which codegens to `not value`) in place of 'DICT_KEYS' guard (which codegens to call a slower type check function) when guarding on global nnmodule dicts being empty. Note: it's definitely not foolproof to use BOOL_FALSE and skip type check, since runtime types could fail to execute bool(runtime_obj), e.g. if obj is a tensor. We are opting hooks dicts into special treatment on the assumption of well behaved users. ghstack-source-id: 9d7425c Pull Request resolved: #96246

pytorch-bot · 2023-03-08T00:46:14Z

Unknown label ciflow/inductor-perf-compare.
Currently recognized labels are

ciflow/binaries
ciflow/binaries_conda
ciflow/binaries_libtorch
ciflow/binaries_wheel
ciflow/inductor
ciflow/inductor-perf-test-nightly
ciflow/mps
ciflow/nightly
ciflow/periodic
ciflow/trunk
ciflow/unstable

wconstab · 2023-03-08T01:29:51Z

Looks like I can't run the perf CI currently as it's being revamped.

But a local run on one model that seemed particularly sensitive to guards overhead in my previous sweeps (MobileBertForQuestionAnswering)

TORCH_COMPILE_DEBUG=1 python benchmarks/dynamo/huggingface.py --only MobileBertForQuestionAnswering --inductor --train --performance

Run	master (with unsafe nnmodule optimization)	revert	this PR
1	1.265x	1.187x	1.183x
2	1.266x	1.188x	1.188x

digging in further, I realize I missed something in my previous explanation of why my optimization made things faster.

When _has_hooks bool exists, we guard on its literal value matching python's singleton False obj. in other words, id(False) maps to 7333504.
___check_obj_id(self.encoder.layer[23].output.bottleneck.LayerNorm._has_hooks, 7333504)

With my attempt to run a faster 'BOOL_FALSE' guard, i go from checking the length and type of a dict to calling the bool() operator on the dict, but it appears even that is too slow.

Cmd	Runtime
`timeit.timeit("isinstance(x, dict) and bool(x)", number=10000000, globals={'x': {}})`	0.9805532030295581
`timeit.timeit("bool(x)", number=10000000, globals={'x': False})`	0.57676091598114
`timeit.timeit("x is False", number=10000000, globals={'x': False})`	0.1867956609930843

For context, in MobileBertForQuestionAnswering there are 651 separate nnModules in the model, requiring

651 obj_id checks (+1 global obj_id check) on master
2604 bool() checks (+4 global bool() checks) on this PR (*4 separate hook dicts per module)

So the speedup from the 'unsound' nnmodule PR is multiplicative 4x * 3x ~= 12x (3x coming from 0.57 / 0.18), but i don't think there is a way to do this optimization without changing the nnmodule side.

Unless I am mistaken, this only leaves

keeping (and improving) the nnmodule hacktimization
using a dynamo config flag to only support hooks sometimes
living with the slower guards (is there ANY other way to make faster bool() guards in dynamo?)

voznesenskym · 2023-03-08T01:50:51Z

Unless I am mistaken, this only leaves

keeping (and improving) the nnmodule hacktimization

using a dynamo config flag to only support hooks sometimes

living with the slower guards (is there ANY other way to make faster bool() guards in dynamo?)

What about moving nn module guards to c++?

github-actions · 2023-05-07T23:33:43Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

wconstab mentioned this pull request Mar 8, 2023

Revert "Optimize nn.Module __call__ fast path for dynamo (#95931)" #96242

Closed

github-actions bot added ciflow/inductor module: dynamo labels Mar 8, 2023

wconstab added the ciflow/inductor-perf-compare label Mar 8, 2023

wconstab added the topic: not user facing topic category label Mar 8, 2023

This was referenced Mar 8, 2023

Fix hooks handling for unpickled nnmodule #96224

Closed

Test the impact of 4 to 1 checks in each module #96361

Closed

github-actions bot added the Stale label May 7, 2023

github-actions bot closed this Jun 7, 2023

facebook-github-bot deleted the gh/wconstab/127/head branch July 7, 2023 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize dynamo guards for global hooks dicts#96246

Optimize dynamo guards for global hooks dicts#96246
wconstab wants to merge 1 commit intogh/wconstab/127/basefrom
gh/wconstab/127/head

wconstab commented Mar 8, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2023

Uh oh!

wconstab commented Mar 8, 2023 •

edited

Loading

Uh oh!

voznesenskym commented Mar 8, 2023

Uh oh!

github-actions bot commented May 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wconstab commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96246

✅ No Failures

Uh oh!

pytorch-bot bot commented Mar 8, 2023

Uh oh!

wconstab commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

voznesenskym commented Mar 8, 2023

Uh oh!

github-actions bot commented May 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wconstab commented Mar 8, 2023 •

edited

Loading

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading

wconstab commented Mar 8, 2023 •

edited

Loading