[invoke_subgraph] Support None in the fwd output #150082

anijain2305 · 2025-03-27T03:56:22Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2025-03-27T03:56:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150082

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit df4afb4 with merge base 15dbad2 ():

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (#149370)
REGRESSION: benchmark ('aotdispatcher_training_subclass_cpu', 'compile_time_instruction_count') failed, actual result 9979835071 is 1.52% higher than expected 9830000000 ±+1.50% if this is an expected regression, please update the expected results.
pull / linux-jammy-xpu-2025.0-py3.9 / build (gh) (#150430)
/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/sstream:152:52: error: expected value in expression

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 7436df3 Pull Request resolved: #150082

zou3519 · 2025-03-27T13:39:56Z

I was talking about this with @angelayi and @ydwu4. Inductor might not be able to handle the None output, so the strategy we were thinking is that Dynamo should remove the None return from invoke_subgraph's subgraph. I haven't read through the PR yet so I'm not sure if it takes this approach

zou3519 · 2025-03-27T13:40:17Z

test/higher_order_ops/test_invoke_subgraph.py

+    def test_return_none_from_fwd(self):
+        @mark_compile_region
+        def gn(x):
+            return x * 2, None, x * 3
+
+        def fn(x):
+            ys = gn(x)
+            return ys[0] + ys[2]
+
+        opt_fn = torch.compile(fn, backend="inductor", fullgraph=True)


Can we see an expecttest for the graph?

Added for both Dynamo and AOT fwd and bwd

[ghstack-poisoned]

anijain2305 · 2025-03-27T18:50:20Z

A subgraph returning None is totally ok with Inductor. I worked on plumbing that last year.

Here is the output code from Inductor for the test case - https://www.internalfb.com/phabricator/paste/view/P1768286550

The approach here -

Dynamo no change - return None in the forward graph.
The fwd_graph of the joint_graph will have to return None. Since this is wrapped up in an autograd.Function, the grad_in will also have None in the corresponding place for backward of autograd.Function op. But the backward graph does not have to worry about None. So, we will construct the backward graph without None, and filter out None in the backward of autograd.Function object.

[ghstack-poisoned]

ghstack-source-id: fc394df Pull Request resolved: #150082

zou3519 · 2025-03-28T12:43:50Z

torch/_higher_order_ops/invoke_subgraph.py

+        # force the grad_outs to be contiguous. Some of the grads can be None,
+        # because the forward outs could be None. Filter them out.
+        contiguous_grad_outs = []
+        for o in grad_outs:
+            if o is not None:
+                contiguous_grad_outs.append(o.contiguous())
+        contiguous_grad_outs = tuple(contiguous_grad_outs)


Can you assert that the only None grad_outs are the ones where the forward was None? Because this happens during tracing of the outer graph, this won't increase runtime.

The code here is only correct if we do not change ctx.set_materialize_grads (default True). In the future, I expect we'll want to set it to False to improve performance, which will lead to the following correctness issue. The assertion will help us catch these issues when we flip it.

An output being None does imply that the grad_out is None.

However, at trace time of the outer forward+backward, if ctx.set_materialize_grads=False it is possible that an out to invoke_subgraph is a Tensor but a grad_out is None. This happens if the gradient for that Tensor was never computed or if autograd optimized it away (because it thought it was zero).

zou3519

LGTM, but added a comment for an extra assertion we should add

[ghstack-poisoned]

ghstack-source-id: 0a2f92d Pull Request resolved: #150082

pytorchmergebot · 2025-04-01T22:51:44Z

Starting merge as part of PR stack under #150450

…ass (#150450) Pull Request resolved: #150450 Approved by: https://github.com/zou3519 ghstack dependencies: #150082

…alse (#150486) I am not sure if this is the right way. Pull Request resolved: #150486 Approved by: https://github.com/zou3519 ghstack dependencies: #150082, #150450

…t module (#150556) Pull Request resolved: #150556 Approved by: https://github.com/bdhirsh, https://github.com/zou3519 ghstack dependencies: #150082, #150450, #150486

…150561) I am unable to come up with a testcase. It passes many end-to-end tests that fail with ReshapeError at https://ossci-raw-job-status.s3.amazonaws.com/log/39717218372 ![image](https://github.com/user-attachments/assets/8509b485-3897-4538-968b-bbe05af63a59) Pull Request resolved: #150561 Approved by: https://github.com/zou3519, https://github.com/bdhirsh ghstack dependencies: #150082, #150450, #150486, #150556

Pull Request resolved: pytorch#150082 Approved by: https://github.com/zou3519

…ass (pytorch#150450) Pull Request resolved: pytorch#150450 Approved by: https://github.com/zou3519 ghstack dependencies: pytorch#150082

…alse (pytorch#150486) I am not sure if this is the right way. Pull Request resolved: pytorch#150486 Approved by: https://github.com/zou3519 ghstack dependencies: pytorch#150082, pytorch#150450

…t module (pytorch#150556) Pull Request resolved: pytorch#150556 Approved by: https://github.com/bdhirsh, https://github.com/zou3519 ghstack dependencies: pytorch#150082, pytorch#150450, pytorch#150486

…ytorch#150561) I am unable to come up with a testcase. It passes many end-to-end tests that fail with ReshapeError at https://ossci-raw-job-status.s3.amazonaws.com/log/39717218372 ![image](https://github.com/user-attachments/assets/8509b485-3897-4538-968b-bbe05af63a59) Pull Request resolved: pytorch#150561 Approved by: https://github.com/zou3519, https://github.com/bdhirsh ghstack dependencies: pytorch#150082, pytorch#150450, pytorch#150486, pytorch#150556

ghstack-source-id: abb58a3 Pull Request resolved: pytorch/pytorch#150082

[invoke_subgraph] Support None in the fwd output

8d4c127

[ghstack-poisoned]

anijain2305 requested a review from zou3519 as a code owner March 27, 2025 03:56

anijain2305 mentioned this pull request Mar 27, 2025

[dynamo][invoke_subgraph] Input aliasing and mutation check in Dynamo #148953

Closed

anijain2305 added a commit that referenced this pull request Mar 27, 2025

[invoke_subgraph] Support None in the fwd output

c9c63db

ghstack-source-id: 7436df3 Pull Request resolved: #150082

anijain2305 added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Mar 27, 2025

anijain2305 mentioned this pull request Mar 27, 2025

[inductor] No type promotion for slice_scatter #150090

Closed

zou3519 reviewed Mar 27, 2025

View reviewed changes

Update on "[invoke_subgraph] Support None in the fwd output"

3108b47

[ghstack-poisoned]

anijain2305 added 3 commits March 27, 2025 11:59

Update on "[invoke_subgraph] Support None in the fwd output"

77d2632

[ghstack-poisoned]

Update on "[invoke_subgraph] Support None in the fwd output"

c647e6c

[ghstack-poisoned]

Update on "[invoke_subgraph] Support None in the fwd output"

dee1b44

[ghstack-poisoned]

anijain2305 added a commit that referenced this pull request Mar 28, 2025

[invoke_subgraph] Support None in the fwd output

0ba6504

ghstack-source-id: fc394df Pull Request resolved: #150082

zou3519 reviewed Mar 28, 2025

View reviewed changes

zou3519 approved these changes Mar 28, 2025

View reviewed changes

Update on "[invoke_subgraph] Support None in the fwd output"

df4afb4

[ghstack-poisoned]

anijain2305 added a commit that referenced this pull request Apr 1, 2025

[invoke_subgraph] Support None in the fwd output

8e927bd

ghstack-source-id: 0a2f92d Pull Request resolved: #150082

anijain2305 mentioned this pull request Apr 1, 2025

[invoke_subgraph] Do not cache fake tensors for AOTDispatcher first pass #150450

Closed

anijain2305 requested a review from zou3519 April 1, 2025 21:19

zou3519 approved these changes Apr 1, 2025

View reviewed changes

anijain2305 mentioned this pull request Apr 1, 2025

[invoke_subgraph] Filter out grad_out where fw_out requires_grad is False #150486

Closed

pytorchmergebot closed this in b060fed Apr 2, 2025

pytorchmergebot added the Merged label Apr 2, 2025

This was referenced Apr 2, 2025

[invoke_subgraph][min-cut partitioner] Fix bug to use the correct root module #150556

Closed

[invoke_subgraph] Force grad_outs to be contiguous at tracing time #150561

Closed

This was referenced Apr 4, 2025

[invoke_subgraph] Lazy backward #150666

Closed

[invoke_subgraph][fake_tensor] Run the subgraph with fake tensor mode to validate cache #150704

Closed

Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025

[invoke_subgraph] Support None in the fwd output

caafbe5

ghstack-source-id: abb58a3 Pull Request resolved: pytorch/pytorch#150082

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[invoke_subgraph] Support None in the fwd output #150082

[invoke_subgraph] Support None in the fwd output #150082

Uh oh!

anijain2305 commented Mar 27, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 27, 2025 •

edited

Loading

Uh oh!

zou3519 commented Mar 27, 2025

Uh oh!

zou3519 Mar 27, 2025

Uh oh!

anijain2305 Mar 27, 2025

Uh oh!

anijain2305 commented Mar 27, 2025 •

edited

Loading

Uh oh!

zou3519 Mar 28, 2025 •

edited

Loading

Uh oh!

zou3519 left a comment

Uh oh!

pytorchmergebot commented Apr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[invoke_subgraph] Support None in the fwd output #150082

[invoke_subgraph] Support None in the fwd output #150082

Uh oh!

Conversation

anijain2305 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150082

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

zou3519 commented Mar 27, 2025

Uh oh!

zou3519 Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

anijain2305 Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

anijain2305 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zou3519 Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Apr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anijain2305 commented Mar 27, 2025 •

edited

Loading

pytorch-bot bot commented Mar 27, 2025 •

edited

Loading

anijain2305 commented Mar 27, 2025 •

edited

Loading

zou3519 Mar 28, 2025 •

edited

Loading