Preserve original GraphArgs for shape guard codegen #90665

ezyang · 2022-12-11T20:06:53Z

Stack from ghstack (oldest at bottom):

Signed-off-by: Edward Z. Yang [email protected]

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

pytorch-bot · 2022-12-11T20:06:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90665

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 58b38cc:

The following jobs have failed:

cuda11.6-py3.10-gcc7-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: a663a98 Pull Request resolved: #90665

Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

voznesenskym · 2022-12-11T22:35:53Z

test/dynamo/test_repros.py

+        # Can we actually demonstrate that checkpointing the ShapeEnv is
+        # necessary?  It's not so easy to induce this case.  Dynamo is very
+        # eager about adding locals to GraphArgs; any local that is in scope,
+        # even if it isn't used, is added to GraphArgs (see also
+        # https://github.com/pytorch/torchdynamo/issues/1925 ).  So long
+        # as Dynamo eagerly guards in this way, we have an invariant that
+        # all locals are guaranteed to show up in GraphArgs before the
+        # inlining function call, in which case we will always have enough
+        # information to codegen our guards so long as we don't prune the
+        # unused GraphArgs away (and indeed, the direct fix for this bug
+        # was to make sure we use original GraphArgs).  Non locals,
+        # conversely, typically are static, and so won't have guards allocated
+        # for them.  That being said, there may still be a way to trigger
+        # this error.


This deserves its own test to protect this invariant, we take it as a matter of course for now, but it changing subtly under the hood could cause really nasty to debug issues. Let's file an issue to test this (specifically, the exhaustive guarding of locals).

This isn't a variant. I just failed to come up with a test case that could actually trigger the more naughty fix.

The proper fix is to checkpoint shape env properly. But I'm hoping we can put that off a bit.

It turns out that you can trigger the edge case discussed here with unspecialized ints, and in fact quite easily.

Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: d4ed60d Pull Request resolved: #90665

ezyang · 2022-12-11T23:24:51Z

@pytorchbot merge

pytorchmergebot · 2022-12-11T23:26:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-12-12T01:57:52Z

Merge failed

Reason: 2 additional jobs have failed, first few of them are: inductor ,inductor / cuda11.6-py3.10-gcc7-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

ezyang · 2022-12-12T02:33:33Z

@pytorchbot merge -f "master failure only"

pytorchmergebot · 2022-12-12T02:35:16Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

bhack · 2024-06-05T23:51:39Z

I had an error on nightly mentioning this PR

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: s6 (could be from ["L['self'].last_size_2d[0]"]) not in {s1: ["L['t0']"], s10: ["L['t1']"], s7: ["L['t3']"], s6: []}.  If this assert is failing, it could be due to the issue described in https://github.com/pytorch/pytorch/pull/90665

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

What info do you need?

ezyang · 2024-06-06T00:36:47Z

Are you running the cache by any chance? If so, it's probably #127970

bhack · 2024-06-06T12:36:22Z

Yes it is with TORCHINDUCTOR_FX_GRAPH_CACHE=1

When we don't dynamo.reset(), we don't recompile on different dynamic shapes. Additionally, this exposes a bug that fails with the view ``` File "/home/dberard/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 3996, in produce_guards sexpr = ShapeGuardPrinter(symbol_to_source, source_ref, self.var_to_sources).doprint(expr) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 292, in doprint return self._str(self._print(expr)) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 56, in _print_Add t = self._print(term) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 366, in _print_Mul a_str = [self.parenthesize(x, prec, strict=False) for x in a] File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 366, in <listcomp> a_str = [self.parenthesize(x, prec, strict=False) for x in a] File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 37, in parenthesize return self._print(item) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 1494, in _print_Symbol assert self.symbol_to_source.get(expr), ( AssertionError: s3 (could be from ['<ephemeral: symint_visitor_fn>', '<ephemeral: symint_visitor_fn>']) not in {s0: ["L['x'].a.size()[1]", "L['x'].b.size()[1]", "L['x'].size()[1]", "L['x'].a.size()[1]", "L['x'].b.size()[1]", "L['x'].a.size()[1]", "L['x'].b.size()[1]"], s1: ["L['x'].a.stride()[0]", "L['x'].b.stride()[0]", "L['x'].stride()[0]", "L['x'].a.stride()[0]", "L['x'].b.stride()[0]", "L['x'].a.stride()[0]", "L['x'].b.stride()[0]"], s2: ["L['x'].a.storage_offset()", "L['x'].b.storage_offset()", "L['x'].a.storage_offset()", "L['x'].b.storage_offset()"]}. If this assert is failing, it could be due to the issue described in #90665 ``` [ghstack-poisoned]

When we don't dynamo.reset(), we don't recompile on different dynamic shapes. Also, some of the returned views were tuples - so when we `* 2`, we actually just copy all the inputs twice in the tuple. I changed it so that it would just return one of the values from the return tuple. Additionally, this exposes a bug that fails with the slice operation, so I skipped it when we're testing with dynamic shapes: ``` File "/home/dberard/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 3996, in produce_guards sexpr = ShapeGuardPrinter(symbol_to_source, source_ref, self.var_to_sources).doprint(expr) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 292, in doprint return self._str(self._print(expr)) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 56, in _print_Add t = self._print(term) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 366, in _print_Mul a_str = [self.parenthesize(x, prec, strict=False) for x in a] File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 366, in <listcomp> a_str = [self.parenthesize(x, prec, strict=False) for x in a] File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 37, in parenthesize return self._print(item) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 1494, in _print_Symbol assert self.symbol_to_source.get(expr), ( AssertionError: s3 (could be from ['<ephemeral: symint_visitor_fn>', '<ephemeral: symint_visitor_fn>']) not in {s0: ["L['x'].a.size()[1]", "L['x'].b.size()[1]", "L['x'].size()[1]", "L['x'].a.size()[1]", "L['x'].b.size()[1]", "L['x'].a.size()[1]", "L['x'].b.size()[1]"], s1: ["L['x'].a.stride()[0]", "L['x'].b.stride()[0]", "L['x'].stride()[0]", "L['x'].a.stride()[0]", "L['x'].b.stride()[0]", "L['x'].a.stride()[0]", "L['x'].b.stride()[0]"], s2: ["L['x'].a.storage_offset()", "L['x'].b.storage_offset()", "L['x'].a.storage_offset()", "L['x'].b.storage_offset()"]}. If this assert is failing, it could be due to the issue described in #90665 ``` cc cpuhrsch jbschlosser bhosmer drisspg soulitzer ezyang albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

When we don't dynamo.reset(), we don't recompile on different dynamic shapes. Also, some of the returned views were tuples - so when we `* 2`, we actually just copy all the inputs twice in the tuple. I changed it so that it would just return one of the values from the return tuple. Additionally, this exposes a bug that fails with the slice operation, so I skipped it when we're testing with dynamic shapes: ``` File "/home/dberard/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 3996, in produce_guards sexpr = ShapeGuardPrinter(symbol_to_source, source_ref, self.var_to_sources).doprint(expr) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 292, in doprint return self._str(self._print(expr)) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 56, in _print_Add t = self._print(term) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 366, in _print_Mul a_str = [self.parenthesize(x, prec, strict=False) for x in a] File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 366, in <listcomp> a_str = [self.parenthesize(x, prec, strict=False) for x in a] File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 37, in parenthesize return self._print(item) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 1494, in _print_Symbol assert self.symbol_to_source.get(expr), ( AssertionError: s3 (could be from ['<ephemeral: symint_visitor_fn>', '<ephemeral: symint_visitor_fn>']) not in {s0: ["L['x'].a.size()[1]", "L['x'].b.size()[1]", "L['x'].size()[1]", "L['x'].a.size()[1]", "L['x'].b.size()[1]", "L['x'].a.size()[1]", "L['x'].b.size()[1]"], s1: ["L['x'].a.stride()[0]", "L['x'].b.stride()[0]", "L['x'].stride()[0]", "L['x'].a.stride()[0]", "L['x'].b.stride()[0]", "L['x'].a.stride()[0]", "L['x'].b.stride()[0]"], s2: ["L['x'].a.storage_offset()", "L['x'].b.storage_offset()", "L['x'].a.storage_offset()", "L['x'].b.storage_offset()"]}. If this assert is failing, it could be due to the issue described in #90665 ``` Pull Request resolved: #128659 Approved by: https://github.com/YuqingJ

) When we don't dynamo.reset(), we don't recompile on different dynamic shapes. Also, some of the returned views were tuples - so when we `* 2`, we actually just copy all the inputs twice in the tuple. I changed it so that it would just return one of the values from the return tuple. Additionally, this exposes a bug that fails with the slice operation, so I skipped it when we're testing with dynamic shapes: ``` File "/home/dberard/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 3996, in produce_guards sexpr = ShapeGuardPrinter(symbol_to_source, source_ref, self.var_to_sources).doprint(expr) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 292, in doprint return self._str(self._print(expr)) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 56, in _print_Add t = self._print(term) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 366, in _print_Mul a_str = [self.parenthesize(x, prec, strict=False) for x in a] File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 366, in <listcomp> a_str = [self.parenthesize(x, prec, strict=False) for x in a] File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/str.py", line 37, in parenthesize return self._print(item) File "/home/dberard/local/miniconda3/envs/pytorch/lib/python3.10/site-packages/sympy/printing/printer.py", line 331, in _print return printmethod(expr, **kwargs) File "/home/dberard/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 1494, in _print_Symbol assert self.symbol_to_source.get(expr), ( AssertionError: s3 (could be from ['<ephemeral: symint_visitor_fn>', '<ephemeral: symint_visitor_fn>']) not in {s0: ["L['x'].a.size()[1]", "L['x'].b.size()[1]", "L['x'].size()[1]", "L['x'].a.size()[1]", "L['x'].b.size()[1]", "L['x'].a.size()[1]", "L['x'].b.size()[1]"], s1: ["L['x'].a.stride()[0]", "L['x'].b.stride()[0]", "L['x'].stride()[0]", "L['x'].a.stride()[0]", "L['x'].b.stride()[0]", "L['x'].a.stride()[0]", "L['x'].b.stride()[0]"], s2: ["L['x'].a.storage_offset()", "L['x'].b.storage_offset()", "L['x'].a.storage_offset()", "L['x'].b.storage_offset()"]}. If this assert is failing, it could be due to the issue described in pytorch#90665 ``` Pull Request resolved: pytorch#128659 Approved by: https://github.com/YuqingJ

brthor · 2024-06-30T04:15:57Z

I'm commenting here for anyone else who lands here by the link in the exception.

I hit this in pytorch 2.3.0.

The issue was that I was slicing the input tensor to my network that was torch.compiled, which caused the stride to mismatch the shape.

I was doing this:

x = x[:, :, :new_size]
net(x)

Adding .contiguous() fixed this issue:

x = x[:, :, :new_size].contiguous()
net(x)

E0630 04:11:58.231000 139761003709504 torch/_guards.py:251] [1/1] Error while creating guard:
E0630 04:11:58.231000 139761003709504 torch/_guards.py:251] [1/1] Name: ''
E0630 04:11:58.231000 139761003709504 torch/_guards.py:251] [1/1]     Source: shape_env
E0630 04:11:58.231000 139761003709504 torch/_guards.py:251] [1/1]     Create Function: SHAPE_ENV
E0630 04:11:58.231000 139761003709504 torch/_guards.py:251] [1/1]     Guard Types: None
E0630 04:11:58.231000 139761003709504 torch/_guards.py:251] [1/1]     Code List: None
E0630 04:11:58.231000 139761003709504 torch/_guards.py:251] [1/1]     Object Weakref: None
E0630 04:11:58.231000 139761003709504 torch/_guards.py:251] [1/1]     Guarded Class Weakref: None
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1] Created at:
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1]   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 482, in transform
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1]     tracer = InstructionTranslator(
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1]   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2060, in __init__
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1]     output=OutputGraph(
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1]   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 310, in __init__
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1]     self.init_ambient_guards()
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1]   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 421, in init_ambient_guards
E0630 04:11:58.231000 139761003709504 torch/_guards.py:253] [1/1]     self.guards.add(ShapeEnvSource().make_guard(GuardBuilder.SHAPE_ENV))

...

AssertionError: s2 (could be from ["L['var_x']._base.stride()[0]"]) not in {s1: [], s2: []}.  If this assert is failing, it could be due to the issue described in https://github.com/pytorch/pytorch/pull/90665

I think there is probably a bug here though.

ezyang · 2024-06-30T11:50:15Z

Oh, for yours, it's possible nightly has fixed it. I think I need to update the error description here

…views" **TL;DR:** This PR does the following hacks: * Allows SymInt replacements for expressions with components that are integers, reciprocals (e.g. `1/s0`), and multiplications of symbols with reciprocals (e.g. `s1/s0`). Previously, only integers were [allowed](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/fx/experimental/symbolic_shapes.py#L4970). For example, when we find `Eq(s2, s1/s0)`, we can now replace `s2` with `s1/s0`. * Approximates the value range for `1/s0` with positive integer `s0` as `[0, 1]`. **Background:** During subclass fake-ification, subclass instances that are views go through the process of "view replay" to reconstruct the base -> view relationship in fake land. Part of view replay involves symbolicizing any closed-over ints in the original view function. For example, a slice view `x[3]` will have a `3` baked in as an arg to slice in the saved view func; we want to symbolicize this 3 for dynamic shapes support. In practice, before the view is replayed, all closed-over ints are replaced with SymInts with "ephemeral sources" that are intended to disappear when we [assert](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/_subclasses/meta_utils.py#L1073-L1078) the subclass size / stride / storage offset match the allocated, non-ephemeral outer symbolic values. For this to work, we rely on the [SymInt replacement logic](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/fx/experimental/symbolic_shapes.py#L4740-L4744) (e.g. replace `s1` with `s0 * 5` if `Eq(s1, s0*5)` is found to be true), and [prioritize ephemerally-sourced SymInts](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/fx/experimental/symbolic_shapes.py#L4909-L4921) to be replaced first. **Problem:** In certain cases, the SymInt replacement logic does not apply, and an ephemerally-sourced SymInt lives for longer than originally designed, resulting in guard time errors like those reported [here](#133337 (comment)) and in #128649. ``` AssertionError: s7 (could be from ['<ephemeral: symint_visitor_fn>', '<ephemeral: symint_visitor_fn>']) not in {s3: ["L['nt']._values.size()[0]", "L['nt']._values.size()[0]"], s0: ["L['nt'].size()[1]"], s6: ["L['nt'].size()[2]", "L['nt']._values.size()[1]"], s1: ["L['nt']._values.size()[0]"], s7: []}. If this assert is failing, it could be due to the issue described in #90665 ``` **Solution:** For each of the error cases, we can identify why a particular SymInt replacement is not made and enhance the logic to support such replacements. This PR addresses the following case: ```python torch.compile(backend="eager", dynamic=True) def f(t): return t._base + 1 x_a = torch.randn(4, 4, requires_grad=True) x = TwoTensor(x_a, x_a.clone()) out = f(x[3]) ``` The input to the compiled func is a subclass view produced by `slice()`. Fake-ification of this subclass view does the following: 1. Allocates non-ephemeral size / stride / storage offset symbols for the outer subclass metadata. Since the slice is contiguous with shape `(4)`, we get `(s0)` for the shape, `(1)` for the stride, and `s1` for the storage offset. 2. Perform view replay on a fake-ified base to reconstruct the base -> view relationship. The captured `3` arg to `slice()` is replaced with an ephemerally-sourced SymInt `s2` before the view func is replayed. The output of view replay has shape `(s0)`, stride `(1)`, and `s0 * s2` for the storage offset. 3. [Asserts](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/_subclasses/meta_utils.py#L1073-L1078) that the output of view replay has the shape metadata allocated in step (1). Since we have `Eq(s1, s0 * s2)` from the storage offset assert and `s2` is ephemeral, we should replace `s2` with `s1/s0`. Before this PR, this replacement was not supported since `s1/s0` is not guaranteed to be an integer. Note that the `try_solve()` call below is correctly able to find `s2=s1/s0`. https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/fx/experimental/symbolic_shapes.py#L4969-L4970 This PR hacks around this, expanding the supported set of replacements to include both reciprocals (e.g. `1/s0`, represented as `Pow(s0, -1)`) and multiplications of symbols with reciprocals (e.g. `s1/s0`, which is represented as `Mul(s1, Pow(s0, -1))`). To make a replacement, it is also necessary to determine a new value range for the expression, and the new value range must be a subset of the existing value range for the symbol to be replaced. Before this PR, the logic did not know how to calculate a new value range for a reciprocal `Pow(s0, -1)`, only supporting positive exponents. This PR hacks in a rough loose bound for the expected value range of `Pow(s0, -1)` AKA `1/s0` given the value range of `s0` for positive integer `s0`. To keep value ranges integral so as to operate well with other integral value ranges, the value range for `1/s0` is approximated as `[0, 1]`. In an expression like `s1/s0`, the lower / upper bounds for the ranges of `s1` and `1/s0` are multiplied to determine the final value range for the full expression. For example, given a range of `[0, int_oo]` for `s1` and the approximated `[0, 1]` range for `1/s0`, the final range for `s1/s0` is also found to be `[0, int_oo]`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

**TL;DR:** This PR does the following: * Uses `CleanDiv` instead of `FloorDiv` in more places when we know it holds to get strictly more information: * In `_try_isolate_lhs()`, if we know that `lhs` and `rhs` are integers, we can solve via `CleanDiv` instead of `/` * The PR teaches `_try_isolate_lhs()` how to handle `CleanDiv(numer, denom)` where the `denom` component contains the thing we're trying to isolate. This happens in practice for `slice()` and `view()` views * In the `infer_size()` util used by various views (including `view()` and `reshape()`), use `CleanDiv` for inferring the `-1` dimension from the `total size / product of other dims` * We also learn that all components of the product are >= 1, so provide that information as well **Background:** During subclass fake-ification, subclass instances that are views go through the process of "view replay" to reconstruct the base -> view relationship in fake land. Part of view replay involves symbolicizing any closed-over ints in the original view function. For example, a slice view `x[3]` will have a `3` baked in as an arg to slice in the saved view func; we want to symbolicize this 3 for dynamic shapes support. In practice, before the view is replayed, all closed-over ints are replaced with SymInts with "ephemeral sources" that are intended to disappear when we [assert](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/_subclasses/meta_utils.py#L1073-L1078) the subclass size / stride / storage offset match the allocated, non-ephemeral outer symbolic values. For this to work, we rely on the [SymInt replacement logic](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/fx/experimental/symbolic_shapes.py#L4740-L4744) (e.g. replace `s1` with `s0 * 5` if `Eq(s1, s0*5)` is found to be true), and [prioritize ephemerally-sourced SymInts](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/fx/experimental/symbolic_shapes.py#L4909-L4921) to be replaced first. **Problem:** In certain cases, the SymInt replacement logic does not apply, and an ephemerally-sourced SymInt lives for longer than originally designed, resulting in guard time errors like those reported [here](#133337 (comment)) and in #128649. ``` AssertionError: s7 (could be from ['<ephemeral: symint_visitor_fn>', '<ephemeral: symint_visitor_fn>']) not in {s3: ["L['nt']._values.size()[0]", "L['nt']._values.size()[0]"], s0: ["L['nt'].size()[1]"], s6: ["L['nt'].size()[2]", "L['nt']._values.size()[1]"], s1: ["L['nt']._values.size()[0]"], s7: []}. If this assert is failing, it could be due to the issue described in #90665 ``` **Solution:** For each of the error cases, we can identify why a particular SymInt replacement is not made and enhance the logic to support such replacements. This PR addresses this for `slice()` and `view()` views that have been encountered in practice. For example: ```python torch.compile(backend="eager", dynamic=True) def f(t): return t._base + 1 x_a = torch.randn(4, 4, requires_grad=True) x = TwoTensor(x_a, x_a.clone()) out = f(x[3]) ``` The input to the compiled func is a subclass view produced by `slice()`. Fake-ification of this subclass view does the following: 1. Allocates non-ephemeral size / stride / storage offset symbols for the outer subclass metadata. Since the slice is contiguous with shape `(4)`, we get `(s0)` for the shape, `(1)` for the stride, and `s1` for the storage offset. 2. Perform view replay on a fake-ified base to reconstruct the base -> view relationship. The captured `3` arg to `slice()` is replaced with an ephemerally-sourced SymInt `s2` before the view func is replayed. The output of view replay has shape `(s0)`, stride `(1)`, and `s0 * s2` for the storage offset. 3. [Asserts](https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/_subclasses/meta_utils.py#L1073-L1078) that the output of view replay has the shape metadata allocated in step (1). Since we have `Eq(s1, s0 * s2)` from the storage offset assert and `s2` is ephemeral, we should replace `s2` with `CleanDiv(s1, s0)`. Before this PR, this replacement was not supported since `s1/s0` is not guaranteed to be an integer. With `CleanDiv`, we have the integer guarantee and the replacement can happen. https://github.com/pytorch/pytorch/blob/49e0b88aab0fd124c318fe1c59fbbd8726298338/torch/fx/experimental/symbolic_shapes.py#L4969-L4970 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

Preserve original GraphArgs for shape guard codegen

231132a

Signed-off-by: Edward Z. Yang <[email protected]> [ghstack-poisoned]

ezyang mentioned this pull request Dec 11, 2022

Improve dynamo debug logging #90664

Closed

pytorch-bot bot added the release notes: fx release notes category label Dec 11, 2022

ezyang added a commit that referenced this pull request Dec 11, 2022

Preserve original GraphArgs for shape guard codegen

7483039

Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: a663a98 Pull Request resolved: #90665

github-actions bot requested review from Chillee, SherlockNoMad, albanD, anjali411, antoniojkim, bdhirsh, miladm, voznesenskym and wconstab December 11, 2022 20:07

github-actions bot added ciflow/inductor module: dynamo labels Dec 11, 2022

voznesenskym reviewed Dec 11, 2022

View reviewed changes

voznesenskym approved these changes Dec 11, 2022

View reviewed changes

ezyang added a commit that referenced this pull request Dec 11, 2022

Preserve original GraphArgs for shape guard codegen

79ec9bc

Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: d4ed60d Pull Request resolved: #90665

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 11, 2022

pytorchmergebot added the Merged label Dec 12, 2022

pytorchmergebot closed this in 8fd31ac Dec 12, 2022

ezyang mentioned this pull request Mar 9, 2023

OpenNMT doesn't work with dynamic torch.compile #93468

Closed

facebook-github-bot deleted the gh/ezyang/1648/head branch June 8, 2023 16:39

HDCharles mentioned this pull request Jun 27, 2023

Attempt to use minifier on sam model fails #104301

Closed

yw-furiosa mentioned this pull request Nov 3, 2023

Cannot compile resnet18 using make_fx #112821

Closed

This was referenced Jul 8, 2024

Add FSDP2 support for low-bit optimizers pytorch/ao#484

Merged

Low-bit optim support for DTensor [to be closed] pytorch/ao#490

Closed

GLivshits mentioned this pull request Aug 16, 2024

TORCHDYNAMO_REPRO_AFTER=dynamo TORCHDYNAMO_REPRO_LEVEL=4 for accuracy debugging fails on extremely simple example #133691

Closed

socket-security bot mentioned this pull request Sep 1, 2025

Bump torch from 2.7.1 to 2.8.0 alphasecio/prompt-guard#44

Closed

AndreasMadsen mentioned this pull request Oct 10, 2025

cryptic symbolic shape error with FSDP2 and torch.compile #165177

Closed

socket-security bot mentioned this pull request Nov 1, 2025

Bump torch from 2.8.0 to 2.9.0 alphasecio/prompt-guard#46

Closed

socket-security bot mentioned this pull request Nov 14, 2025

[Snyk] Security upgrade torch from 1.12.1 to 2.9.0 naiba4/na#43

Open

Preserve original GraphArgs for shape guard codegen #90665

Preserve original GraphArgs for shape guard codegen #90665

Uh oh!

Conversation

ezyang commented Dec 11, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90665

❌ 1 Failures

Uh oh!

voznesenskym Dec 11, 2022

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 11, 2022

Choose a reason for hiding this comment

Uh oh!

ezyang Mar 3, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang commented Dec 11, 2022

Uh oh!

pytorchmergebot commented Dec 11, 2022

Merge started

Uh oh!

pytorchmergebot commented Dec 12, 2022

Merge failed

Uh oh!

ezyang commented Dec 12, 2022

Uh oh!

pytorchmergebot commented Dec 12, 2022

Merge started

Uh oh!

bhack commented Jun 5, 2024

Uh oh!

ezyang commented Jun 6, 2024

Uh oh!

bhack commented Jun 6, 2024

Uh oh!

brthor commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Jun 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ezyang commented Dec 11, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 11, 2022 •

edited

Loading

brthor commented Jun 30, 2024 •

edited

Loading