Skip to content

Optimize dynamo guards for global hooks dicts#96246

Closed
wconstab wants to merge 1 commit intogh/wconstab/127/basefrom
gh/wconstab/127/head
Closed

Optimize dynamo guards for global hooks dicts#96246
wconstab wants to merge 1 commit intogh/wconstab/127/basefrom
gh/wconstab/127/head

Conversation

@wconstab
Copy link
Contributor

@wconstab wconstab commented Mar 8, 2023

Stack from ghstack (oldest at bottom):

Use 'BOOL_FALSE' guard (which codegens to not value) in place of
'DICT_KEYS' guard (which codegens to call a slower type check function)
when guarding on global nnmodule dicts being empty.

Note: it's definitely not foolproof to use BOOL_FALSE and skip type check,
since runtime types could fail to execute bool(runtime_obj), e.g. if obj is
a tensor. We are opting hooks dicts into special treatment on the assumption
of well behaved users.

Note to reviewers: my previous attempt to use BOOL_FALSE only applied to per-nn.module hooks. I am guessing that this attempt will not yield significant gains since it only affects the 4 global dicts and not the N copies of per-module dicts, but it could be that the runtime of check_type_id is significant enough to matter even there.

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

Use 'BOOL_FALSE' guard (which codegens to `not value`) in place of
'DICT_KEYS' guard (which codegens to call a slower type check function)
when guarding on global nnmodule dicts being empty.

Note: it's definitely not foolproof to use BOOL_FALSE and skip type check,
since runtime types could fail to execute bool(runtime_obj), e.g. if obj is
a tensor.  We are opting hooks dicts into special treatment on the assumption
of well behaved users.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96246

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e6b51d6:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wconstab added a commit that referenced this pull request Mar 8, 2023
Use 'BOOL_FALSE' guard (which codegens to `not value`) in place of
'DICT_KEYS' guard (which codegens to call a slower type check function)
when guarding on global nnmodule dicts being empty.

Note: it's definitely not foolproof to use BOOL_FALSE and skip type check,
since runtime types could fail to execute bool(runtime_obj), e.g. if obj is
a tensor.  We are opting hooks dicts into special treatment on the assumption
of well behaved users.

ghstack-source-id: 9d7425c
Pull Request resolved: #96246
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 8, 2023

Unknown label ciflow/inductor-perf-compare.
Currently recognized labels are

  • ciflow/binaries
  • ciflow/binaries_conda
  • ciflow/binaries_libtorch
  • ciflow/binaries_wheel
  • ciflow/inductor
  • ciflow/inductor-perf-test-nightly
  • ciflow/mps
  • ciflow/nightly
  • ciflow/periodic
  • ciflow/trunk
  • ciflow/unstable

@wconstab wconstab added the topic: not user facing topic category label Mar 8, 2023
@wconstab
Copy link
Contributor Author

wconstab commented Mar 8, 2023

Looks like I can't run the perf CI currently as it's being revamped.

But a local run on one model that seemed particularly sensitive to guards overhead in my previous sweeps (MobileBertForQuestionAnswering)

TORCH_COMPILE_DEBUG=1 python benchmarks/dynamo/huggingface.py --only MobileBertForQuestionAnswering --inductor --train --performance

Run master (with unsafe nnmodule optimization) revert this PR
1 1.265x 1.187x 1.183x
2 1.266x 1.188x 1.188x

digging in further, I realize I missed something in my previous explanation of why my optimization made things faster.

When _has_hooks bool exists, we guard on its literal value matching python's singleton False obj. in other words, id(False) maps to 7333504.
___check_obj_id(self.encoder.layer[23].output.bottleneck.LayerNorm._has_hooks, 7333504)

With my attempt to run a faster 'BOOL_FALSE' guard, i go from checking the length and type of a dict to calling the bool() operator on the dict, but it appears even that is too slow.

Cmd Runtime
timeit.timeit("isinstance(x, dict) and bool(x)", number=10000000, globals={'x': {}}) 0.9805532030295581
timeit.timeit("bool(x)", number=10000000, globals={'x': False}) 0.57676091598114
timeit.timeit("x is False", number=10000000, globals={'x': False}) 0.1867956609930843

For context, in MobileBertForQuestionAnswering there are 651 separate nnModules in the model, requiring

  • 651 obj_id checks (+1 global obj_id check) on master
  • 2604 bool() checks (+4 global bool() checks) on this PR (*4 separate hook dicts per module)

So the speedup from the 'unsound' nnmodule PR is multiplicative 4x * 3x ~= 12x (3x coming from 0.57 / 0.18), but i don't think there is a way to do this optimization without changing the nnmodule side.

Unless I am mistaken, this only leaves

  • keeping (and improving) the nnmodule hacktimization
  • using a dynamo config flag to only support hooks sometimes
  • living with the slower guards (is there ANY other way to make faster bool() guards in dynamo?)

@voznesenskym
Copy link
Collaborator

Unless I am mistaken, this only leaves

  • keeping (and improving) the nnmodule hacktimization
  • using a dynamo config flag to only support hooks sometimes
  • living with the slower guards (is there ANY other way to make faster bool() guards in dynamo?)

What about moving nn module guards to c++?

@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2023

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 7, 2023
@github-actions github-actions bot closed this Jun 7, 2023
@facebook-github-bot facebook-github-bot deleted the gh/wconstab/127/head branch July 7, 2023 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants