[dynamo] Support control flow map() operator. #91939

zhxchen17 · 2023-01-10T01:41:51Z

Fixes #ISSUE_NUMBER

We want to add support for control flow map() at dynamo level to unblock some internal model which will have to use map() operator in captured graph. Basically I replicate the pattern for implementing cond() op from #90286

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

pytorch-bot · 2023-01-10T01:41:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91939

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3cfcad6:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2023-01-11T12:30:16Z

torch/_dynamo/variables/torch.py

            tx.output.register_attr_or_module(gm, next_name, source=src)
            return next_name

+        def make_subgraph(f, sub_args, graph_checkpoint, checkpoint):


This should have "speculate" in its name as it doesn't actually apply the changes to Dynamo state

ezyang · 2023-01-11T12:31:22Z

torch/_dynamo/variables/torch.py

+                body_nn_modules,
+                body_cmp,
+            ) = make_subgraph(args[0], [
+                wrap_fx_proxy(tx, args[1].as_proxy()[0], **VariableTracker.propagate(args[1])),


This looks suspicious. What's going on here.

ezyang · 2023-01-11T12:31:47Z

torch/_dynamo/variables/torch.py

+                "body", torch.fx.GraphModule(body_nn_modules, body_graph)
+            )
+
+            # Apply side effects (guaranteed to be equal)


this comment is out of date, you only ran one branch

ezyang · 2023-01-11T12:32:45Z

torch/_dynamo/variables/torch.py

+                *(arg.as_proxy() for arg in args[1:])
+            )
+            r = body_r.as_proxy().node.meta["example_value"]
+            example_value = r.new_empty([get_fake_value(args[1].as_proxy().node, tx).shape[0], *r.shape])


This looks wrong.

ezyang

Broadly speaking, the implementation here looks like it's cargo culted from cond's implementation, but this is not really appropriate, because map is quite different from cond in two respects:

There's only one lambda, but it can get run multiple times
The input/output of the lambda diverge from the outer context

(1) means that you need to, for example, assert that there are no side effects in the lambda (or that the side effects are idempotent or something). Suppose that inside the lambda you call nonlocal x; x += 1. The eager mode map semantics will increment this counter every iteration of the loop. But what you have implemented here applies the side effects once, and then is done. You will have the wrong semantics in this case.

(2) appears to be what is going on with the weirdness with the making fx proxy tensor and then the stuff with new_empty to make the example value. I can believe that the implementation as is works in most cases, but it needs to be documented far better. In particular, when you say x[0] you are leaning on the fact that make_subgraph doesn't actually use the passed in proxies in an interesting way; instead, converts them into nodes. Otherwise, it would be incorrect to say that you had called the lambda with x[0]. In fact, I think I might prefer that we not pass in a proxy (except maybe to make the names nicer) because the implementation here will not work if you map over a tensor with size (0, *sizes) (since x[0] will fail in this case.)

I will expect tests for all of these edge cases.

zhxchen17 · 2023-01-12T01:25:04Z

Broadly speaking, the implementation here looks like it's cargo culted from cond's implementation, but this is not really appropriate, because map is quite different from cond in two respects:

There's only one lambda, but it can get run multiple times

The input/output of the lambda diverge from the outer context

(1) means that you need to, for example, assert that there are no side effects in the lambda (or that the side effects are idempotent or something). Suppose that inside the lambda you call nonlocal x; x += 1. The eager mode map semantics will increment this counter every iteration of the loop. But what you have implemented here applies the side effects once, and then is done. You will have the wrong semantics in this case.

(2) appears to be what is going on with the weirdness with the making fx proxy tensor and then the stuff with new_empty to make the example value. I can believe that the implementation as is works in most cases, but it needs to be documented far better. In particular, when you say x[0] you are leaning on the fact that make_subgraph doesn't actually use the passed in proxies in an interesting way; instead, converts them into nodes. Otherwise, it would be incorrect to say that you had called the lambda with x[0]. In fact, I think I might prefer that we not pass in a proxy (except maybe to make the names nicer) because the implementation here will not work if you map over a tensor with size (0, *sizes) (since x[0] will fail in this case.)

I will expect tests for all of these edge cases.

@ezyang All these makes a lot of sense, thanks!
So my basic assumption here is that we are not going to support loop bodies that have side effect, because we haven't saw real use cases anyway, but it's true that I need to check for those.
Just want to make sure I understand point (2) correctly:
Instead of passing x[0] and calling the lambda, we could just construct a new sample value which is not related to the the current execution context, and then trace the lambda (assuming it works better for (0, *sizes) shape)?
In this case what we could do to provide an example output for the whole torch.map()? I guess we still need to construct example value based on example value of the inner body graph.

ezyang · 2023-01-12T11:29:12Z

It's easy to say "oh, I will just check that there are no side effects in the loop body" but Dynamo does model some operations (like accessing closed over variables) as side effects. In any case, try asserting no side effects and see if it errors or not on the models you care about.

Re (2), you understand correctly. You actually point out a good point though, which is that zero size input cannot work anyway, as you MUST run the lambda to actually get the output shape, but the lambda is unrunnable if you have no samples. So I guess map() as defined here cannot work with zero size, and so maybe accessing the first element is fine. Better add a check wrt though...

zhxchen17 · 2023-01-18T21:49:37Z

Sorry for the late update. Addressed comments with a few updates:

Check for scalar / zero sized tensor for map() during tracing. Added a unit test case.
Check for extra pending side effects from calling map() body, currently we just throw unsupported error if map() has any side effect. Added a unit test case.
Nits: added/removed comments about sample inputs for map(). renamed make_subgraph to speculate_subgraph.

cc @ezyang

ezyang · 2023-01-18T23:47:10Z

torch/_dynamo/variables/torch.py

I hope this access doesn't induce a write to the FX graph

ezyang · 2023-01-18T23:51:00Z

torch/_dynamo/variables/torch.py

The rest of body_cmp is ignored here. Maybe it is safer to factor out the comparable state into its own function, and then you can get the comparable state prior to running the lambda and then do a full comparison there.

zhxchen17 · 2023-01-19T08:17:58Z

updates:

Construct a TensorVariable with inner graph proxy directly for the sample input xs[0], so that we don't insert an extra getitem node into the parent graph.
Factor out the comparable state for original graph state, and do a full comparison between the original graph state and the loop body graph state.

zhxchen17 · 2023-01-19T08:18:20Z

@pytorchbot merge

pytorchmergebot · 2023-01-19T08:20:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-19T14:18:43Z

The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot.

zhxchen17 · 2023-01-19T17:57:49Z

@pytorchbot merge

pytorchmergebot · 2023-01-19T18:00:15Z

Merge failed

Reason: Not merging any PRs at the moment because there is a merge blocking https://github.com/pytorch/pytorch/labels/ci:%20sev issue open at:
#92626

Details for Dev Infra team

Raised by workflow job

zhxchen17 · 2023-01-19T21:30:54Z

@pytorchbot merge

pytorchmergebot · 2023-01-19T21:32:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions bot added ciflow/inductor module: dynamo labels Jan 10, 2023

zhxchen17 changed the title ~~[dynamo] Support map() operator.~~ [dynamo] Support control flow map() operator. Jan 10, 2023

zhxchen17 requested review from ezyang, jansel and voznesenskym January 10, 2023 01:42

ezyang requested a review from zou3519 January 10, 2023 22:16

ezyang mentioned this pull request Jan 11, 2023

[WIP] Fix for not properly capturing closure variables in cond / export #91981

Closed

ezyang reviewed Jan 11, 2023

View reviewed changes

ezyang requested changes Jan 11, 2023

View reviewed changes

zhxchen17 force-pushed the zhxchen17/control_flow/2 branch 2 times, most recently from 43707dc to 2c42b59 Compare January 18, 2023 21:45

zhxchen17 requested a review from ezyang January 18, 2023 21:49

zhxchen17 added the topic: not user facing topic category label Jan 18, 2023

ezyang reviewed Jan 18, 2023

View reviewed changes

torch/_dynamo/variables/torch.py Outdated

Copy link

Contributor

ezyang Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this access doesn't induce a write to the FX graph

ezyang reviewed Jan 18, 2023

View reviewed changes

ezyang approved these changes Jan 18, 2023

View reviewed changes

[dynamo] Support map() operator.

3cfcad6

zhxchen17 force-pushed the zhxchen17/control_flow/2 branch from 2c42b59 to 3cfcad6 Compare January 19, 2023 08:10

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 19, 2023

pytorchmergebot added the Merged label Jan 19, 2023

pytorchmergebot closed this in 706aa51 Jan 19, 2023

github-actions bot deleted the zhxchen17/control_flow/2 branch July 20, 2024 01:53

[dynamo] Support control flow map() operator. #91939

[dynamo] Support control flow map() operator. #91939

Uh oh!

Conversation

zhxchen17 commented Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91939

✅ No Failures

Uh oh!

ezyang Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

zhxchen17 commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Jan 12, 2023

Uh oh!

zhxchen17 commented Jan 18, 2023

Uh oh!

ezyang Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

zhxchen17 commented Jan 19, 2023

Uh oh!

zhxchen17 commented Jan 19, 2023

Uh oh!

pytorchmergebot commented Jan 19, 2023

Merge started

Uh oh!

pytorchmergebot commented Jan 19, 2023

Uh oh!

zhxchen17 commented Jan 19, 2023

Uh oh!

pytorchmergebot commented Jan 19, 2023

Merge failed

Uh oh!

zhxchen17 commented Jan 19, 2023

Uh oh!

pytorchmergebot commented Jan 19, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhxchen17 commented Jan 10, 2023 •

edited

Loading

pytorch-bot bot commented Jan 10, 2023 •

edited

Loading

zhxchen17 commented Jan 12, 2023 •

edited

Loading