Skip to content

Conversation

@jbschlosser
Copy link
Contributor

@jbschlosser jbschlosser commented Dec 4, 2024

Stack from ghstack (oldest at bottom):

Related: #125914 (specifically see comment)

This PR addresses two broken things involving the usage of unbacked SymInts for calls to slice() with data-dependent bounds. These issues are encountered in practice for narrow() operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether torch.compile() is called with dynamic=True. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

Error 1 (dynamic=False):

LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)

The expression comes from the use of clamp() logic for SliceView in Inductor:

return sympy.Min(sympy.Max(x, lower), upper)

If the (start, end) bounds for the slice() are statically known to be in range for the given dim (e.g. provided via torch._check() calls), we can avoid this clamp() logic and the error. This PR implements this fix.

Error 2 (dynamic=True):

torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)

The storage offset of the values component of the returned NJT is s1*u0 where s1 is known to be an integer. This PR expands the special logic handling the constant * u0 case to handle SymInts as well:

# When an unbacked SymInt is perfectly divisible by an integer
# constant, we replace it with the integer constant to improve
# reasoning capabilities. However, in synthetic examples, it is
# then possible that the factor never is explicitly allocated.
# Fortunately, we can compute it by division.
elif (
isinstance(a, torch.SymInt)
and isinstance(s := a.node._expr, sympy.Mul)
and len(s.args) == 2
and isinstance(lhs := s.args[0], sympy.Integer)
and isinstance(rhs := s.args[1], sympy.Symbol)
and rhs in pending
):
# TODO: DivideByKey needs to test divisibility at runtime!
r[rhs] = path + (DivideByKey(int(lhs)),)
if real is not None:
assert isinstance(real, int)
shape_env.set_unbacked_var_to_val(rhs, real // int(lhs))
pending.remove(rhs)

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142062

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 28ae4f8 with merge base e885225 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jbschlosser jbschlosser added the topic: not user facing topic category label Dec 4, 2024
@jbschlosser jbschlosser changed the title Unbacked SymInt fixes for slice() on subclasses Unbacked SymInt fixes for subclasses + data-dependent slice() bounds Dec 4, 2024
@jbschlosser jbschlosser requested a review from ezyang December 4, 2024 18:37
…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Dec 6, 2024

What I am missing right now is a more detailed step through of how exactly we end up in the situation here

The storage offset of the values component of the returned NJT is s1*u0 where s1 is known to be an integer. This PR expands the special logic handling the constant * u0 case to handle SymInts as well:

Going to carefully read the unit test now.

@ezyang
Copy link
Contributor

ezyang commented Dec 6, 2024

Adding @bdhirsh @IvanKobzarev as there are subclasses involved

# return value of the torch_dispatch is handled.
# Subclass forwards everything along to the single underlying dense tensor
# component, except for slice(), which it handles via data-dependent bounds access
class CustomSliceSubclass(WrapperTensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdhirsh Do we have a clear description of the correctness invariants for tensor subclasses that have dynamic shapes anywhere?


@classmethod
def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
if func is torch.ops.aten.slice.Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I understand correctly, when we do Dynamo tracing, we will end up with a torch.ops.aten.slice.Tensor node in the graph, which is taking the subclass in as an argument. So for the purpose of unbacked_binding calculation, we must be able to determine the allocated unbacked SymInts from the returned tensor of this subclass slice operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will end up with a torch.ops.aten.slice.Tensor node in the graph, which is taking the subclass in as an argument.

yep, this is correct.

So for the purpose of unbacked_binding calculation, we must be able to determine the allocated unbacked SymInts from the returned tensor of this subclass slice operation?

yep that's my understanding as well. compute_unbacked_bindings() checks size / stride / storage_offset of subclass outer inner tensors for these unbacked SymInts. In particular for this subclass, t.storage_offset() will be <something> * u1 where <something> is a constant when dynamic=False and a backed SymInt when dynamic=True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that you shouldn't be detecting the offset from storage_offset. IMO, it should just be stored directly somewhere.

return CustomSliceSubclass(
func(args[0].t, dim=0, start=start, end=(start + length)),
slice_bounds=args[0].slice_bounds,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this looks pretty suspicious. Let me explain why.

If I am right in https://github.com/pytorch/pytorch/pull/142062/files#r1873458013 . This means that you are in the "this is a custom op that returns unbacked SymInt". The rule for custom ops that return unbacked SymInt is that they must return the SymInt in an "easy to access" way, because that is how we are going to learn how to bind it.

Now, the strategy you seem to have taken in this PR is to use DivideByKey to extract out the unbacked SymInt out of exactly how you have decided to define the slice subclass representation. But we should also examine whether or not the slice subclass representation should be represented differently.

In particular, you are storing slice_bounds as a Tensor. This seems pretty strange. You have accessed the raw int via the item() call. Why are you not just storing the item() directly?

I may need to understand the narrow PR better, I vaguely understand that it is data dependent inner size but I am also uncertain why the start and length are not just immediately derivable from the inner tensor without any division.

Copy link
Contributor Author

@jbschlosser jbschlosser Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that you are in the "this is a custom op that returns unbacked SymInt". The rule for custom ops that return unbacked SymInt is that they must return the SymInt in an "easy to access" way, because that is how we are going to learn how to bind it.

This sounds right; in this case, the unbacked SymInt ends up in the storage_offset of the returned tensor's inner t tensor, but it's not trivial to access it there as there is a (either constant or SymInt) coefficient involved.

Now, the strategy you seem to have taken in this PR is to use DivideByKey to extract out the unbacked SymInt out of exactly how you have decided to define the slice subclass representation.

right, the strategy is to extract the unbacked symint out of the returned tensor's storage_offset by pulling it out of <coefficient> * u1 when <coefficient> is a SymInt.

But we should also examine whether or not the slice subclass representation should be represented differently.
In particular, you are storing slice_bounds as a Tensor. This seems pretty strange. You have accessed the raw int via the item() call. Why are you not just storing the item() directly?

I think in isolation updating the subclass representation could make sense. Unfortunately, this emulates NJT's storage of offsets as a Tensor. For NJT, the bounds for narrow() (and thus slice()) come from offsets data, so we can't store the item() directly.

I may need to understand the narrow PR better, I vaguely understand that it is data dependent inner size but I am also uncertain why the start and length are not just immediately derivable from the inner tensor without any division.

Summarizing the above: slice() / narrow() bounds come from data-dependent offsets access for NJT, both of these handle the start of the view by changing storage_offset to be <stride> * <start>, so the unbacked SymInt for the start of the range ends up in the inner tensor's storage_offset as <stride> * u1. If <stride> is a constant, there's no problem since we have logic for pulling unbacked SymInts out of <constant> * u1. If <stride> is a backed SymInt (as is the case when dynamic=True), we have problems, and this PR attempts to address this case through the use of DivideByKey with a backed SymInt divisor for <stride>.

…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
torch._check(start + length <= inp.size(0))

return CustomSliceSubclass(
func(args[0].t, dim=0, start=start, end=(start + length)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make narrow on narrow work, do this directly with as_strided and the unbacked symint is the FINAL offset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang turns out if I go this route, most of the fixes are no longer needed, AND narrow on narrow works.

  1. Inductor-side clamp() checking for statically_known info: unneeded
  2. Dynamic shapes support for pulling unbacked SymInt out of s0 * u0: unneeded
  3. call_method support in _node_metadata_hook(): needed

do you think it's worth still following through with 1 or 2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clamp fix is harmless, I think, it could potentially result in slightly better code for unbacked SymInts. I think you did convince me that in principle pulling out u0 from s0 * u0 is potentially useful in the same way divisibility is useful. I'm fine if you drop both though, we can always do them later if we need it.

@ezyang
Copy link
Contributor

ezyang commented Dec 6, 2024

1:1 discuss: the current design doesn't work if you narrow a narrow, Joel knows how to fix it

…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still much prefer this case be unified with the constant literal one

@jbschlosser
Copy link
Contributor Author

jbschlosser commented Dec 19, 2024

I would still much prefer this case be unified with the constant literal one

will do :) Edit: done

…e() bounds"


Related: #125914 (specifically see [comment](#125914 (comment)))

This PR addresses two broken things involving the usage of unbacked SymInts for calls to `slice()` with data-dependent bounds. These issues are encountered in practice for `narrow()` operating on the batch dim with an NJT input, but apply to other subclasses as well. The test in this PR uses a purpose-built subclass.

There are two different issues here, depending on whether `torch.compile()` is called with `dynamic=True`. In practice, these only occur when the unbacked SymInts are created within the torch_dispatch implementation of a subclass, because the unbacked symbols are considered "freshly created" when the output subclass instance is handled in Dynamo.

**Error 1 (dynamic=False):**
```
LoweringException: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(-Min(22, Max(0, u0)) + Min(22, Max(u0 + u1, Max(0, u0))), 0) (unhinted: Eq(-Min(s0, Max(0, u0)) + Min(s0, Max(u0 + u1, Max(0, u0))), 0)).  (Size-like symbols: u1, u0)
```

The expression comes from the use of `clamp()` logic for `SliceView` in Inductor:
https://github.com/pytorch/pytorch/blob/41e59754b407533b060b874c22ca4feda38bd83a/torch/_inductor/ir.py#L3014

If the (start, end) bounds for the `slice()` are statically known to be in range for the given dim (e.g. provided via `torch._check()` calls), we can avoid this `clamp()` logic and the error. This PR implements this fix.

**Error 2 (dynamic=True):**
```
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {u0} not in returned outputs NestedTensor(size=(2, s16, s1), offsets=FakeTensor(..., device='cuda:0', size=(3,), dtype=torch.int64), grad_fn=<NarrowBackwardAutogradNestedTensor0 object at 0x7f1f8603cfd0>, contiguous=True) ((s1*s16, s1, 1), s1*u0)
```

The storage offset of the values component of the returned NJT is `s1*u0` where `s1` is known to be an integer. This PR expands the special logic handling the `constant * u0` case to handle SymInts as well:
https://github.com/pytorch/pytorch/blob/314e08eb52ad0e9b1c3eb6e149ec8a452e05b9c3/torch/fx/experimental/symbolic_shapes.py#L1013-L1031

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Dec 19, 2024
…(non-dynamic) (#143526)

Lifted non-controversial (non-dynamic) fixes from #142062. See description there for context.

Pull Request resolved: #143526
Approved by: https://github.com/ezyang
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 19, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/jbschlosser/207/head branch January 19, 2025 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fx Merged module: inductor release notes: fx release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants