Skip to content

Conversation

@bobrenjc93
Copy link
Contributor

@bobrenjc93 bobrenjc93 commented Oct 25, 2024

Stack from ghstack (oldest at bottom):

As discussed w/ @ezyang offline, one way to de-risk the specialize_float=False rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:

  1. It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
  2. It updates the tensorify pass to do the backup specialization

This pass was originally part of the PR that flips specialize_float=False but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:

  1. Integrate turning off specialize float only in the automatic dynamic pass.
  2. Put up a canary diff that only turns off specialize float in backend=eager mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
  3. Put up a canary diff that only turns off specialize float in backend=aot_eager mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138868

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit ed8dabc with merge base d8f99f3 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@bobrenjc93 bobrenjc93 requested a review from ezyang October 25, 2024 01:57
@bobrenjc93 bobrenjc93 marked this pull request as ready for review October 25, 2024 01:57
@bobrenjc93 bobrenjc93 requested a review from bdhirsh as a code owner October 25, 2024 01:57
…nsorify away"


As discussed w/ ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:

1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
2) It updates the tensorify pass to do the backup specialization

This pass was originally part of the [PR](#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:

1) Integrate turning off specialize float only in the automatic dynamic pass.
2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.

[ghstack-poisoned]
…nsorify away"


As discussed w/ ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:

1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
2) It updates the tensorify pass to do the backup specialization

This pass was originally part of the [PR](#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:

1) Integrate turning off specialize float only in the automatic dynamic pass.
2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.

[ghstack-poisoned]
…nsorify away"


As discussed w/ ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:

1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
2) It updates the tensorify pass to do the backup specialization

This pass was originally part of the [PR](#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:

1) Integrate turning off specialize float only in the automatic dynamic pass.
2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.

[ghstack-poisoned]
…nsorify away"


As discussed w/ ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:

1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
2) It updates the tensorify pass to do the backup specialization

This pass was originally part of the [PR](#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:

1) Integrate turning off specialize float only in the automatic dynamic pass.
2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.

[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Oct 29, 2024
@bobrenjc93
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 30, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@bobrenjc93
Copy link
Contributor Author

@pytorchbot merge -f "ci failures unrelated"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented Oct 31, 2024

@pytorchbot revert -m 'Sorry for reverting your change but I think the new tests are failing on fbcode' -c ghfirst

The diff is D65247988 and the failure is:

======================================================================
ERROR: test_unspecialized_float_fallback_specialization_cuda (caffe2.test.inductor.test_torchinductor_dynamic_shapes.TestInductorDynamicCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/testing/_internal/common_utils.py", line 3052, in wrapper
    method(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/testing/_internal/common_device_type.py", line 480, in instantiated_test
    raise rte
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/testing/_internal/common_device_type.py", line 460, in instantiated_test
    result = test(self, **param_kwargs)
  File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/caffe2/test/inductor/test_torchinductor_dynamic_shapes.py", line 1005, in test_unspecialized_float_fallback_specialization
    self.assertEqual(fn(x, 2.0, z), fn_opt(x, 2.0, z))
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/eval_frame.py", line 554, in _fn
    return fn(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 1428, in __call__
    return self._torchdynamo_orig_callable(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 1211, in __call__
    result = self._inner_convert(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 548, in __call__
    return _compile(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 981, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 707, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_utils_internal.py", line 350, in wrapper_function
    result = StrobelightCompileTimeProfiler.profile_compile_time(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/caffe2/fb/strobelight/compile_time_profiler.py", line 162, in profile_compile_time
    return func(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 742, in _compile_inner
    out_code = transform_code_object(code, transform)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/bytecode_transformation.py", line 1337, in transform_code_object
    transformations(instructions, code_options)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 232, in _fn
    return fn(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/convert_frame.py", line 661, in transform
    tracer.run()
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 2909, in run
    super().run()
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 1115, in run
    while self.step():
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 1027, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 3100, in RETURN_VALUE
    self._return(inst)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/symbolic_convert.py", line 3085, in _return
    self.output.compile_subgraph(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1173, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1411, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1458, in call_user_compiler
    return self._call_user_compiler(gm)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1507, in _call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/output_graph.py", line 1488, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/testing.py", line 233, in __call__
    return lookup_backend(self.backend)(gm, example_inputs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/backends/inductor.py", line 12, in inductor
    return compile_fx(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 1693, in compile_fx
    return aot_autograd(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/backends/common.py", line 72, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified
    compiled_fn = dispatch_and_compile()
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/aot_autograd.py", line 1081, in dispatch_and_compile
    compiled_fn, _ = create_aot_dispatcher_function(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/aot_autograd.py", line 528, in create_aot_dispatcher_function
    return _create_aot_dispatcher_function(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/aot_autograd.py", line 780, in _create_aot_dispatcher_function
    compiled_fn, fw_metadata = compiler_fn(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 196, in aot_dispatch_base
    compiled_fw = compiler(fw_module, updated_flat_args)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 1510, in fw_compiler_base
    return _fw_compiler_base(model, example_inputs, is_inference)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 1579, in _fw_compiler_base
    return inner_compile(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 578, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_dynamo/repro/after_aot.py", line 100, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/fb/utils.py", line 167, in newFunction
    return old_func(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 735, in _compile_fx_inner
    compiled_graph = FxGraphCache.load(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/codecache.py", line 1460, in load
    compiled_graph = compile_fx_fn(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 642, in codegen_and_compile
    compiled_graph = fx_codegen_and_compile(gm, example_inputs, **fx_kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/compile_fx.py", line 934, in fx_codegen_and_compile
    graph.run(*example_inputs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/graph.py", line 820, in run
    return super().run(*args)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/fx/interpreter.py", line 167, in run
    self.env[node] = self.run_node(node)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/graph.py", line 1411, in run_node
    result = super().run_node(n)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/fx/interpreter.py", line 228, in run_node
    return getattr(self, n.op)(n.target, args, kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/graph.py", line 1060, in call_function
    raise LoweringException(e, target, args, kwargs).with_traceback(
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/graph.py", line 1057, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/lowering.py", line 401, in wrapped
    out = decomp_fn(*args, **kwargs)
  File "/dev/shm/uid-30083/48acfd58-seed-nspid4026537968_cgpid14780696-ns-4026537918/torch/_inductor/lowering.py", line 2841, in _local_scalar_dense
    V.graph.sizevars.shape_env, V.graph.current_node.meta["unbacked_bindings"]
torch._dynamo.exc.BackendCompilerFailed: backend='?' raised:
LoweringException: KeyError: 'unbacked_bindings'
  target: aten._local_scalar_dense.default
  args[0]: TensorBox(StorageBox(
    InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float64, size=[], stride=[]))
  ))

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Oct 31, 2024
…rify away (#138868)"

This reverts commit a494572.

Reverted #138868 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the new tests are failing on fbcode ([comment](#138868 (comment)))
@pytorchmergebot
Copy link
Collaborator

@bobrenjc93 your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Oct 31, 2024
@bobrenjc93
Copy link
Contributor Author

@bobrenjc93
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented Nov 2, 2024

@bobrenjc93 I still see the same failure on the diff D65342565, please help take a look. We would either need to come up with a fix quickly, or I need to revert it for a reland. Keeping a diff for too long in the train opens up conflicts between github and fbcode

bobrenjc93 added a commit that referenced this pull request Nov 2, 2024
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
pytorch#138868)

As discussed w/ @ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:

1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
2) It updates the tensorify pass to do the backup specialization

This pass was originally part of the [PR](pytorch#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:

1) Integrate turning off specialize float only in the automatic dynamic pass.
2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.

Pull Request resolved: pytorch#138868
Approved by: https://github.com/ezyang
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…rify away (pytorch#138868)"

This reverts commit a494572.

Reverted pytorch#138868 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the new tests are failing on fbcode ([comment](pytorch#138868 (comment)))
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
pytorch#138868)

As discussed w/ @ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things:

1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation)
2) It updates the tensorify pass to do the backup specialization

This pass was originally part of the [PR](pytorch#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows:

1) Integrate turning off specialize float only in the automatic dynamic pass.
2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised.
3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised.

Pull Request resolved: pytorch#138868
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fx Merged module: inductor release notes: fx release notes category Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants