turn off specialize float #137782

bobrenjc93 · 2024-10-11T14:57:40Z

Stack from ghstack (oldest at bottom):

-> turn off specialize float #137782

This is the next step in support dynamic float arguments in PT2: https://docs.google.com/document/d/1HswUSp9H6mg8Vg27mhRk8YzC9q_uf63b6wz-gwx65BQ/edit?pli=1#heading=h.xvyiqp8tuje6. To make this more incremental and tractable, we've decided to opt the export path our of this first phase of the rollout.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @rec

[ghstack-poisoned]

pytorch-bot · 2024-10-11T14:57:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137782

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 97 New Failures, 1 Unrelated Failure

As of commit e97ede6 with merge base c0879d0 ():

NEW FAILURES - The following jobs have failed:

Check mergeability of ghstack PR / ghstack-mergeability-check (gh)
RuntimeError: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x b030c81 returned non-zero exit code 1
inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
sam_fast
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_huggingface, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)
AllenaiLongformerBase
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hrnet_w18
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_timm, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
xcit_large_24_p8_224
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hf_T5_generate
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
vision_maskrcnn
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_huggingface, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)
AllenaiLongformerBase
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hrnet_w18
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
xcit_large_24_p8_224
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hf_T5_generate
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
vision_maskrcnn
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_norm_cuda
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_cuda_float16
inductor / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_norm_cuda
inductor / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_cuda_float16
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hrnet_w18
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
xcit_large_24_p8_224
inductor / linux-jammy-cpu-py3.12-gcc11-inductor-halide / test (inductor-halide, 1, 1, linux.12xlarge) (gh)
inductor/test_halide.py::CpuHalideTests::test_upsample_cat_conv_cpu
inductor / linux-jammy-cpu-py3.12-gcc11-inductor-triton-cpu / test (inductor-triton-cpu, 1, 1, linux.12xlarge) (gh)
inductor/test_triton_cpu_backend.py::CpuTritonTests::test_upsample_cat_conv_cpu
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_amp_freezing_timm, 1, 2, linux.16xlarge.spr) (gh)
hrnet_w18
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_amp_freezing_torchbench, 1, 2, linux.16xlarge.spr) (gh)
detectron2_maskrcnn_r_50_fpn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_amp_freezing_torchbench, 2, 2, linux.16xlarge.spr) (gh)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_avx2_timm, 1, 2, linux.10xlarge.avx2) (gh)
hrnet_w18
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_avx2_torchbench, 1, 2, linux.10xlarge.avx2) (gh)
detectron2_maskrcnn_r_50_fpn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_avx2_torchbench, 2, 2, linux.10xlarge.avx2) (gh)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_timm, 1, 2, linux.12xlarge) (gh)
hrnet_w18
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_torchbench, 1, 2, linux.12xlarge) (gh)
detectron2_maskrcnn_r_50_fpn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_torchbench, 2, 2, linux.12xlarge) (gh)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_timm, 1, 2, linux.12xlarge) (gh)
hrnet_w18
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_torchbench, 1, 2, linux.12xlarge) (gh)
detectron2_maskrcnn_r_50_fpn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_torchbench, 2, 2, linux.12xlarge) (gh)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_timm, 1, 2, linux.12xlarge) (gh)
hrnet_w18
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge) (gh)
detectron2_fcos_r_50_fpn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.12xlarge) (gh)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx2, 1, 2, linux.10xlarge.avx2) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_nn_functional_interpolate_bicubic_cpu_float32
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx2, 2, 2, linux.10xlarge.avx2) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_cdist_norm_batch_cpu
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx512, 1, 2, linux.12xlarge) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_nn_functional_interpolate_bilinear_cpu_uint8
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx512, 2, 2, linux.12xlarge) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_cdist_norm_batch_cpu
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh)
RuntimeError: Task <Task pending name='Task-1' coro=<FbScribeLogger._worker() running at /opt/conda/envs/py_3.9/lib/python3.9/site-packages/fbscribelogger/__init__.py:214> cb=[_run_until_complete_cb() at /opt/conda/envs/py_3.9/lib/python3.9/asyncio/base_events.py:184]> got Future <Future pending> attached to a different loop
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hrnet_w18
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
doctr_det_predictor
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
vision_maskrcnn
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamic_aot_eager_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hrnet_w18
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamic_aot_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
doctr_det_predictor
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamic_aot_eager_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
vision_maskrcnn
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamo_eager_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hrnet_w18
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamo_eager_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
doctr_det_predictor
inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamo_eager_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
vision_maskrcnn
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/fx/passes/_tensorify_python_scalars.py:
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 1, 5, linux.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::CpuTests::test_upsample_cat_conv_cpu
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5, linux.4xlarge.nvidia.gpu) (gh)
dynamo/test_misc.py::MiscTests::test_config_obj
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh)
dynamo/test_repros.py::ReproTests::test_reformer_eval
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 4, 5, linux.4xlarge.nvidia.gpu) (gh)
dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 5, 5, linux.4xlarge.nvidia.gpu) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_conjugate_num_type4_dynamic_shapes
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_renorm_cuda_float16
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
dynamo/test_misc.py::MiscTests::test_config_obj
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
dynamo/test_repros.py::ReproTests::test_reformer_eval
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_conjugate_num_type4_dynamic_shapes
pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)
test_dynamic_shape_decoder_mark_dynamic
pull / linux-focal-py3.11-clang10 / test (default, 1, 4, linux.4xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu
pull / linux-focal-py3.11-clang10 / test (default, 2, 4, linux.4xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_config_obj
pull / linux-focal-py3.11-clang10 / test (default, 3, 4, linux.4xlarge) (gh)
dynamo/test_repros.py::ReproTests::test_reformer_eval
pull / linux-focal-py3.11-clang10 / test (default, 4, 4, linux.4xlarge) (gh)
dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num
pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_gradient_type_promotion_cpu
pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
test_nn.py::TestNNDeviceTypeCPU::test_affine_3d_rotateRandom_cpu
pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
functorch/test_eager_transforms.py::TestVmapOfGradCPU::test_per_sample_grads_embeddingnet_mechanism_functional_call_cpu
pull / linux-focal-py3.12-clang10 / test (default, 1, 4, linux.4xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu
pull / linux-focal-py3.12-clang10 / test (default, 2, 4, linux.4xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_config_obj
pull / linux-focal-py3.12-clang10 / test (default, 3, 4, linux.4xlarge) (gh)
dynamo/test_repros.py::ReproTests::test_reformer_eval
pull / linux-focal-py3.12-clang10 / test (default, 4, 4, linux.4xlarge) (gh)
dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num
pull / linux-focal-py3.12-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_exponential_kstest_cpu_bfloat16
pull / linux-focal-py3.12-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
test_nn.py::TestNNDeviceTypeCPU::test_affine_3d_rotateRandom_cpu
pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
functorch/test_eager_transforms.py::TestVmapOfGradCPU::test_per_sample_grads_embeddingnet_mechanism_functional_call_cpu
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 1, 3, linux.4xlarge) (gh)
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_nn_functional_interpolate_bicubic_cpu_uint8
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 2, 3, linux.4xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_config_obj
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 3, 3, linux.4xlarge) (gh)
dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_exponential_kstest_cpu_bfloat16
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 2, 3, linux.2xlarge) (gh)
test_nn.py::TestNNDeviceTypeCPU::test_affine_3d_rotateRandom_cpu
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 3, 3, linux.2xlarge) (gh)
functorch/test_eager_transforms.py::TestVmapOfGradCPU::test_per_sample_grads_embeddingnet_mechanism_functional_call_cpu
pull / linux-focal-py3.9-clang10 / test (default, 1, 4, linux.4xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu
pull / linux-focal-py3.9-clang10 / test (default, 2, 4, linux.4xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_config_obj
pull / linux-focal-py3.9-clang10 / test (default, 3, 4, linux.4xlarge) (gh)
dynamo/test_repros.py::ReproTests::test_reformer_eval
pull / linux-focal-py3.9-clang10 / test (default, 4, 4, linux.4xlarge) (gh)
dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num
pull / linux-focal-py3.9-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_gradient_type_promotion_cpu
pull / linux-focal-py3.9-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
test_nn.py::TestNNDeviceTypeCPU::test_affine_3d_rotateRandom_cpu
pull / linux-focal-py3.9-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
functorch/test_eager_transforms.py::TestVmapOfGradCPU::test_per_sample_grads_embeddingnet_mechanism_functional_call_cpu
pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6, linux.4xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_config_obj
pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6, linux.4xlarge) (gh)
dynamo/test_repros.py::ReproTests::test_reformer_eval
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, linux.4xlarge) (gh)
dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num
pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, linux.4xlarge) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_number_method_method_conjugate_num_type4_dynamic_shapes
pull / linux-jammy-py3.10-clang15-asan / test (default, 6, 6, linux.4xlarge) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesFunctionTests::test_is_integer_dynamic_shapes
pull / linux-jammy-py3.9-gcc11 / test (default, 1, 4, linux.2xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu
pull / linux-jammy-py3.9-gcc11 / test (default, 2, 4, linux.2xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_config_obj
pull / linux-jammy-py3.9-gcc11 / test (default, 3, 4, linux.2xlarge) (gh)
dynamo/test_repros.py::ReproTests::test_reformer_eval
pull / linux-jammy-py3.9-gcc11 / test (default, 4, 4, linux.2xlarge) (gh)
dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100) (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 4b6f851 Pull Request resolved: #137782

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: 9a824db Pull Request resolved: #137782

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: 3be9f48 Pull Request resolved: #137782

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: dcd6932 Pull Request resolved: #137782

This is the next step in support dynamic float arguments in PT2: https://docs.google.com/document/d/1HswUSp9H6mg8Vg27mhRk8YzC9q_uf63b6wz-gwx65BQ/edit?pli=1#heading=h.xvyiqp8tuje6. To make this more incremental and tractable, we've decided to opt the export path our of this first phase of the rollout. [ghstack-poisoned]

ghstack-source-id: 35d27d0 Pull Request resolved: #137782

This is the next step in support dynamic float arguments in PT2: https://docs.google.com/document/d/1HswUSp9H6mg8Vg27mhRk8YzC9q_uf63b6wz-gwx65BQ/edit?pli=1#heading=h.xvyiqp8tuje6. To make this more incremental and tractable, we've decided to opt the export path our of this first phase of the rollout. [ghstack-poisoned]

@ezyang

As discussed with @ezyang, this set of diffs are extracting fixes to problems discovered to flipping `specialize_float=False` in #137782. Since these codepaths are exercised in existing tests, I'm going to bias towards shipping speed and put these up with the primary test plan as the global CI. These code paths are all tested via existing tests when `specialize_float=False` and it feels a bit wonky to add more gated tests that only test behavior when this flag is True, especially since these code paths are already covered. That being said, I'm happy to add individual tests if reviewers insist or have a different POV. Pull Request resolved: #138598 Approved by: https://github.com/ezyang ghstack dependencies: #138595

…ats we didn't tensorify away" As discussed w/ ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things: 1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation) 2) It updates the tensorify pass to do the backup specialization This pass was originally part of the [PR](#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows: 1) Integrate turning off specialize float only in the automatic dynamic pass. 2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised. 3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised. [ghstack-poisoned]

…nsorify away" As discussed w/ ezyang offline, one way to de-risk the `specialize_float=False` rollout is to specialize all backed symfloats that we fail to tensorify away. This diff does a few things: 1) It fixes a bug where item_memo gets dropped (due to incorrect epoch invalidation) 2) It updates the tensorify pass to do the backup specialization This pass was originally part of the [PR](#137782) that flips `specialize_float=False` but we learned that the blast radius is simply too large. We've pivoted to a more milestone driven approach where we learn from the failures of the aforementioned PR and cherry pick fixes into main first. After this current PR lands our strategy is as follows: 1) Integrate turning off specialize float only in the automatic dynamic pass. 2) Put up a canary diff that only turns off specialize float in `backend=eager` mode to sniff out symfloat related bugs in dynamo due to code paths we previously never exercised. 3) Put up a canary diff that only turns off specialize float in `backend=aot_eager` mode to sniff out symfloat related bugs in aotautograd due to code paths we previously never exercised. [ghstack-poisoned]

@ezyang

…_ case (#138595) As discussed with @ezyang, this set of diffs are extracting fixes to problems discovered to flipping `specialize_float=False` in #137782. Since these codepaths are exercised in existing tests, I'm going to bias towards shipping speed and put these up with the primary test plan as the global CI. These code paths are all tested via existing tests when `specialize_float=False` and it feels a bit wonky to add more gated tests that only test behavior when this flag is True, especially since these code paths are already covered. That being said, I'm happy to add individual tests if reviewers insist or have a different POV. Pull Request resolved: #138595 Approved by: https://github.com/ezyang

@ezyang

As discussed with @ezyang, this set of diffs are extracting fixes to problems discovered to flipping `specialize_float=False` in #137782. Since these codepaths are exercised in existing tests, I'm going to bias towards shipping speed and put these up with the primary test plan as the global CI. These code paths are all tested via existing tests when `specialize_float=False` and it feels a bit wonky to add more gated tests that only test behavior when this flag is True, especially since these code paths are already covered. That being said, I'm happy to add individual tests if reviewers insist or have a different POV. Pull Request resolved: #138599 Approved by: https://github.com/ezyang