Skip to content

Conversation

@anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Mar 27, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150082

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit df4afb4 with merge base 15dbad2 (image):

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

anijain2305 added a commit that referenced this pull request Mar 27, 2025
ghstack-source-id: 7436df3
Pull Request resolved: #150082
@anijain2305 anijain2305 added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Mar 27, 2025
@zou3519
Copy link
Contributor

zou3519 commented Mar 27, 2025

I was talking about this with @angelayi and @ydwu4. Inductor might not be able to handle the None output, so the strategy we were thinking is that Dynamo should remove the None return from invoke_subgraph's subgraph. I haven't read through the PR yet so I'm not sure if it takes this approach

Comment on lines +771 to +780
def test_return_none_from_fwd(self):
@mark_compile_region
def gn(x):
return x * 2, None, x * 3

def fn(x):
ys = gn(x)
return ys[0] + ys[2]

opt_fn = torch.compile(fn, backend="inductor", fullgraph=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we see an expecttest for the graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for both Dynamo and AOT fwd and bwd

@anijain2305
Copy link
Contributor Author

anijain2305 commented Mar 27, 2025

A subgraph returning None is totally ok with Inductor. I worked on plumbing that last year.

Here is the output code from Inductor for the test case - https://www.internalfb.com/phabricator/paste/view/P1768286550

The approach here -

  1. Dynamo no change - return None in the forward graph.
  2. The fwd_graph of the joint_graph will have to return None. Since this is wrapped up in an autograd.Function, the grad_in will also have None in the corresponding place for backward of autograd.Function op. But the backward graph does not have to worry about None. So, we will construct the backward graph without None, and filter out None in the backward of autograd.Function object.

anijain2305 added a commit that referenced this pull request Mar 28, 2025
ghstack-source-id: fc394df
Pull Request resolved: #150082
Comment on lines 253 to 259
# force the grad_outs to be contiguous. Some of the grads can be None,
# because the forward outs could be None. Filter them out.
contiguous_grad_outs = []
for o in grad_outs:
if o is not None:
contiguous_grad_outs.append(o.contiguous())
contiguous_grad_outs = tuple(contiguous_grad_outs)
Copy link
Contributor

@zou3519 zou3519 Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assert that the only None grad_outs are the ones where the forward was None? Because this happens during tracing of the outer graph, this won't increase runtime.

The code here is only correct if we do not change ctx.set_materialize_grads (default True). In the future, I expect we'll want to set it to False to improve performance, which will lead to the following correctness issue. The assertion will help us catch these issues when we flip it.

  • An output being None does imply that the grad_out is None.
  • However, at trace time of the outer forward+backward, if ctx.set_materialize_grads=False it is possible that an out to invoke_subgraph is a Tensor but a grad_out is None. This happens if the gradient for that Tensor was never computed or if autograd optimized it away (because it thought it was zero).

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but added a comment for an extra assertion we should add

anijain2305 added a commit that referenced this pull request Apr 1, 2025
ghstack-source-id: 0a2f92d
Pull Request resolved: #150082
@anijain2305 anijain2305 requested a review from zou3519 April 1, 2025 21:19
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #150450

amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants