Disable python dispatcher in fallthrough for PyOperators by angelayi · Pull Request #95891 · pytorch/pytorch

angelayi · 2023-03-02T17:52:43Z

Possible fix for #89037

Context: The existing fallthrough implementation for PyOperators will cause the PythonDispatcher to infinitely redispatch to the PythonDispatcher due to this line which permanently adds the PythonDispatcher to the dispatch key set which we get on this line. We temporarily fixed this by excluding the PythonDispatcher key from the global keyset (here), but this runs into an issue during the implementation for the functionalization key where we want to call functionalize for the true/false subgraphs, and make_fx to check for aliasing/mutations, which requires having the PythonDispatcher key.

Our attempt at fixing this is to modify the fallthrough function to ignore the PythonDispatcher key when generating keys to redispatch to. This should prevent the infinite recursion, but won't modify the global state of having the PythonDispatcher key.

pytorch-bot · 2023-03-02T17:52:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95891

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit cea298f:

NEW FAILURES - The following jobs have failed:

lintrunner / linux-job (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2023-03-03T03:32:51Z

how urgent is this

angelayi · 2023-03-03T13:20:27Z

how urgent is this

semi-urgent? This is blocking turning on functionalization for DPE which uses the control flow ops. Right now we use a hacky version of functionalization which just skips the control flow ops, but it should be removed...

ezyang · 2023-03-03T14:59:11Z

I feel like there is probably a much simpler fix for the problem. Thinking.

ezyang · 2023-03-03T16:06:43Z

functorch/experimental/_cond.py

    """
    try:
-        gm = make_fx(branch)(*fake_inputs)
+        gm = make_fx(branch)(*inputs)


nit: try to avoid unrelated refactor like this, it makes it harder for reviewer to see what's going on

Refactored those fixes into #95988, so those changes should disappear in this PR after that one merges!

ezyang · 2023-03-03T16:37:08Z

see #89037 (comment)

ezyang · 2023-03-07T02:18:43Z

Hmmph. I don't like this, because you're smearing out state on the PyOperator that should be on a per call basis. And in fact, it's not even right, because if I fallthrough a key and then redispatch, I will redo that key (the fallthrough is not sticky!)

ezyang · 2023-03-07T02:19:01Z

Can you tell me more about why the approach we described in VC didn't work out?

angelayi · 2023-03-07T19:16:54Z

Can you tell me more about why the approach we described in VC didn't work out?

My understanding from the VC is that order we want is for every key that is run (besides the PythonDispatcher), it should not be dispatched to again and go back to the PythonDispatcher to dispatch to the following key. So the order should look something like PythonDispatcher -> PythonTLSSnapshot -> PythonDispatcher -> AutogradCPU -> PythonDispatcher ....

The way I thought to do that was to use the ExcludeDispatchKeyGuard to prevent those keys from being dispatched to again. But because that affects the global set of keys, it prevented the inner make_fx call we make in cond from running correctly.

if I fallthrough a key and then redispatch, I will redo that key (the fallthrough is not sticky!)

If you fallthrough a key wouldn't it get added to the list of keys that have been run already and redispatch to PythonDispatcher?

ezyang · 2023-03-07T21:36:08Z

The way I thought to do that was to use the ExcludeDispatchKeyGuard to prevent those keys from being dispatched to again. But because that affects the global set of keys, it prevented the inner make_fx call we make in cond from running correctly.

Not necessary. Because python dispatcher can compute what the correct key to go to is. So then you just call it directly (op_dk, or just call the callable in your python side dispatch dict)

ezyang · 2023-03-08T14:17:37Z

No test?

Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses #89037 in a better way than #95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

…how C++ works" Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses #89037 in a better way than #95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses #89037 in a better way than #95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

…how C++ works" Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses #89037 in a better way than #95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses #89037 in a better way than #95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses #89037 in a better way than #95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: #96304 Approved by: https://github.com/zou3519, https://github.com/albanD, https://github.com/zhxchen17

zhxchen17 · 2023-03-09T19:17:55Z

trying to reach out to Angela to see what's she'll do for this PR. She's on pto right now.

ezyang · 2023-03-10T01:31:25Z

Please check whatever actual use case you needed isn't already fixed on master, I landed a set of orthogonal changes which should fix infinite fallthrough loops. I have no way of testing since this PR doesn't have a test.

Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses pytorch/pytorch#89037 in a better way than pytorch/pytorch#95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch/pytorch#96304 Approved by: https://github.com/zou3519, https://github.com/albanD, https://github.com/zhxchen17

angelayi · 2023-03-13T06:13:30Z

Yup, the actual case is fixed on master. #96635 to remove the existing hacky fallthrough (the tests were already landed previously).

Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses pytorch#89037 in a better way than pytorch#95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#96304 Approved by: https://github.com/zou3519, https://github.com/albanD, https://github.com/zhxchen17

angelayi added the topic: not user facing topic category label Mar 2, 2023

angelayi requested review from ezyang, voznesenskym, ydwu4 and zhxchen17 March 2, 2023 18:45

angelayi marked this pull request as ready for review March 2, 2023 18:46

angelayi requested a review from tugsbayasgalan March 2, 2023 18:46

ezyang reviewed Mar 3, 2023

View reviewed changes

ezyang mentioned this pull request Mar 3, 2023

PyOperator.fallthrough(DispatchKey.PythonDispatcher) will cause infinite recursion during redispatch. #89037

Closed

angelayi added 5 commits March 4, 2023 01:01

Disable python dispatcher

9f9116b

Fix alias issue

28eba46

lint

6f5686b

attempt #2

dbf18e3

cleanup

cea298f

angelayi force-pushed the cond_pyd branch from bff6c1a to cea298f Compare March 4, 2023 01:02

angelayi requested a review from ezyang March 7, 2023 00:36

ezyang mentioned this pull request Mar 8, 2023

Rewrite fallthrough to more closely match how C++ works #96304

Closed

angelayi closed this Mar 13, 2023

github-actions bot deleted the cond_pyd branch September 2, 2024 02:01

Conversation

angelayi commented Mar 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95891

❌ 1 Failures

Uh oh!

ezyang commented Mar 3, 2023

Uh oh!

angelayi commented Mar 3, 2023

Uh oh!

ezyang commented Mar 3, 2023

Uh oh!

ezyang Mar 3, 2023

Choose a reason for hiding this comment

Uh oh!

angelayi Mar 4, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang commented Mar 3, 2023

Uh oh!

ezyang commented Mar 7, 2023

Uh oh!

ezyang commented Mar 7, 2023

Uh oh!

angelayi commented Mar 7, 2023

Uh oh!

ezyang commented Mar 7, 2023

Uh oh!

ezyang commented Mar 8, 2023

Uh oh!

zhxchen17 commented Mar 9, 2023

Uh oh!

ezyang commented Mar 10, 2023

Uh oh!

angelayi commented Mar 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

angelayi commented Mar 2, 2023 •

edited

Loading

pytorch-bot bot commented Mar 2, 2023 •

edited

Loading