Update tensorify pass to specialize symfloats we didn't tensorify away #138868

bobrenjc93 · 2024-10-25T00:07:52Z

Stack from ghstack (oldest at bottom):

-> Update tensorify pass to specialize symfloats we didn't tensorify away #138868

As discussed w/ @ezyang offline, one way to de-risk the specialize_float=False rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:

It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
It updates the tensorify pass to do the backup specialization

This pass was originally part of the PR that flips specialize_float=False but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:

Integrate turning off specialize float only in the automatic dynamic pass.
Put up a canary diff that only turns off specialize float in backend=eager mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
Put up a canary diff that only turns off specialize float in backend=aot_eager mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-10-25T00:07:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138868

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit ed8dabc with merge base d8f99f3 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
inductor/test_torchinductor.py::GPUTests::test_baddbmm_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/inductor/test_torchinductor_dynamic_shapes.py

torch/fx/node.py

torch/fx/passes/_tensorify_python_scalars.py

…nsorify away" As discussed w/ ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things: 1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation) 2) It updates the tensorify pass to do the backup specialization This pass was originally part of the [PR](#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows: 1) Integrate turning off specialize float only in the automatic dynamic pass. 2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised. 3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised. [ghstack-poisoned]

ghstack-source-id: 2395115 Pull Request resolved: #138868

…nsorify away" As discussed w/ ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things: 1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation) 2) It updates the tensorify pass to do the backup specialization This pass was originally part of the [PR](#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows: 1) Integrate turning off specialize float only in the automatic dynamic pass. 2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised. 3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised. [ghstack-poisoned]

ghstack-source-id: efe78a6 Pull Request resolved: #138868

bobrenjc93 · 2024-10-30T17:17:31Z

@pytorchbot merge

pytorchmergebot · 2024-10-30T17:19:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-30T23:17:54Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

bobrenjc93 · 2024-10-30T23:26:41Z

@pytorchbot merge -f "ci failures unrelated"

pytorchmergebot · 2024-10-30T23:28:09Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2024-10-31T21:44:15Z

@pytorchbot revert -m 'Sorry for reverting your change but I think the new tests are failing on fbcode' -c ghfirst

The diff is D65247988 and the failure is:

======================================================================
ERROR: test_unspecialized_float_fallback_specialization_cuda (caffe2.test.inductor.test_torchinductor_dynamic_shapes.TestInductorDynamicCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/testing/_internal/common_utils.py", line 3052, in wrapper
    method(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/testing/_internal/common_device_type.py", line 480, in instantiated_test
    raise rte
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/testing/_internal/common_device_type.py", line 460, in instantiated_test
    result = test(self, **param_kwargs)
  File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/caffe2/test/inductor/test_torchinductor_dynamic_shapes.py", line 1005, in test_unspecialized_float_fallback_specialization
    self.assertEqual(fn(x, 2.0, z), fn_opt(x, 2.0, z))
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/eval_frame.py", line 554, in _fn
    return fn(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 1428, in __call__
    return self._torchdynamo_orig_callable(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 1211, in __call__
    result = self._inner_convert(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 548, in __call__
    return _compile(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 981, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 707, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_utils_internal.py", line 350, in wrapper_function
    result = StrobelightCompileTimeProfiler.profile_compile_time(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/caffe2/fb/strobelight/compile_time_profiler.py", line 162, in profile_compile_time
    return func(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 742, in _compile_inner
    out_code = transform_code_object(code, transform)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/bytecode_transformation.py", line 1337, in transform_code_object
    transformations(instructions, code_options)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 232, in _fn
    return fn(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 661, in transform
    tracer.run()
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 2909, in run
    super().run()
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 1115, in run
    while self.step():
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 1027, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 3100, in RETURN_VALUE
    self._return(inst)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 3085, in _return
    self.output.compile_subgraph(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1173, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1411, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1458, in call_user_compiler
    return self._call_user_compiler(gm)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1507, in _call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1488, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/testing.py", line 233, in __call__
    return lookup_backend(self.backend)(gm, example_inputs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/backends/inductor.py", line 12, in inductor
    return compile_fx(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 1693, in compile_fx
    return aot_autograd(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/backends/common.py", line 72, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified
    compiled_fn = dispatch_and_compile()
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/aot_autograd.py", line 1081, in dispatch_and_compile
    compiled_fn, _ = create_aot_dispatcher_function(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/aot_autograd.py", line 528, in create_aot_dispatcher_function
    return _create_aot_dispatcher_function(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/aot_autograd.py", line 780, in _create_aot_dispatcher_function
    compiled_fn, fw_metadata = compiler_fn(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 196, in aot_dispatch_base
    compiled_fw = compiler(fw_module, updated_flat_args)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 1510, in fw_compiler_base
    return _fw_compiler_base(model, example_inputs, is_inference)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 1579, in _fw_compiler_base
    return inner_compile(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 578, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/repro/after_aot.py", line 100, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/fb/utils.py", line 167, in newFunction
    return old_func(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 735, in _compile_fx_inner
    compiled_graph = FxGraphCache.load(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/codecache.py", line 1460, in load
    compiled_graph = compile_fx_fn(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 642, in codegen_and_compile
    compiled_graph = fx_codegen_and_compile(gm, example_inputs, **fx_kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 934, in fx_codegen_and_compile
    graph.run(*example_inputs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/graph.py", line 820, in run
    return super().run(*args)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/fx/interpreter.py", line 167, in run
    self.env[node] = self.run_node(node)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/graph.py", line 1411, in run_node
    result = super().run_node(n)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/fx/interpreter.py", line 228, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/graph.py", line 1060, in call_function
    raise LoweringException(e, target, args, kwargs).with_traceback(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/graph.py", line 1057, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/lowering.py", line 401, in wrapped
    out = decomp_fn(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/lowering.py", line 2841, in _local_scalar_dense
    V.graph.sizevars.shape_env, V.graph.current_node.meta["unbacked_bindings"]
torch._dynamo.exc.BackendCompilerFailed: backend='?' raised:
LoweringException: KeyError: 'unbacked_bindings'
  target: aten._local_scalar_dense.default
  args[0]: TensorBox(StorageBox(
    InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float64, size=[], stride=[]))
  ))

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

pytorchmergebot · 2024-10-31T21:45:56Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…rify away (#138868)" This reverts commit a494572. Reverted #138868 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the new tests are failing on fbcode ([comment](#138868 (comment)))

pytorchmergebot · 2024-10-31T21:46:09Z

@bobrenjc93 your PR has been successfully reverted.

bobrenjc93 · 2024-10-31T23:14:30Z

Sigh, it's related to T205068920

Will re-land once https://www.internalfb.com/confighub/change/configo/4269739835 pushes

bobrenjc93 · 2024-10-31T23:48:15Z

@pytorchbot merge

pytorchmergebot · 2024-10-31T23:50:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2024-11-02T06:59:55Z

@bobrenjc93 I still see the same failure on the diff D65342565, please help take a look. We would either need to come up with a fix quickly, or I need to revert it for a reland. Keeping a diff for too long in the train opens up conflicts between github and fbcode

ghstack-source-id: 53410a9 Pull Request resolved: #138868

@ezyang

pytorch#138868) As discussed w/ @ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things: 1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation) 2) It updates the tensorify pass to do the backup specialization This pass was originally part of the [PR](pytorch#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows: 1) Integrate turning off specialize float only in the automatic dynamic pass. 2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised. 3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised. Pull Request resolved: pytorch#138868 Approved by: https://github.com/ezyang

…rify away (pytorch#138868)" This reverts commit a494572. Reverted pytorch#138868 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the new tests are failing on fbcode ([comment](pytorch#138868 (comment)))

@ezyang

pytorch#138868) As discussed w/ @ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things: 1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation) 2) It updates the tensorify pass to do the backup specialization This pass was originally part of the [PR](pytorch#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows: 1) Integrate turning off specialize float only in the automatic dynamic pass. 2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised. 3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised. Pull Request resolved: pytorch#138868 Approved by: https://github.com/ezyang

Update tensorify pass to specialize symfloats we didn't tensorify away

333d560

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor release notes: fx release notes category labels Oct 25, 2024

bobrenjc93 mentioned this pull request Oct 25, 2024

Use torch fx graph DCE #138869

Closed

facebook-github-bot added the fx label Oct 25, 2024

bobrenjc93 requested a review from ezyang October 25, 2024 01:57

bobrenjc93 marked this pull request as ready for review October 25, 2024 01:57

bobrenjc93 requested a review from bdhirsh as a code owner October 25, 2024 01:57