[POC] AOTInductor as Inductor backend #141700

ezyang · 2024-11-27T16:48:41Z

Stack from ghstack (oldest at bottom):

Stacked on #141691

Test transcript:

$ time TORCH_LOGS=output_code python a.py
...
I1127 08:46:55.864000 559792 torch/_inductor/codecache.py:1676] [0/0] [__output_code] Output code written to: /tmp/torchinductor_ezyang/ql/cqll2s5yvh2zgfwi3u7d4ywhbbds7m6tsrw43dja5zn7vxxr4cog.cpp
V1127 08:47:54.104000 559792 torch/_inductor/compile_fx.py:1077] [0/0] [__output_code] Package written to /tmp/torchinductor_ezyang/s7/cs7rsjlshhyvytdibm3i5bsuh6kndxtvxsemjlugjjezlz4426vv.pt2

real    2m38.385s
user    2m26.927s
sys     0m30.435s
$ time TORCH_LOGS=output_code python a.py

real    0m13.905s
user    0m13.348s
sys     0m9.563s

tlparse miss: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpUFFDtO/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100
tlparse hit: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpv0D6hF/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100

For comparison, here is the test script without AOTInductor as backend:

$ time TORCH_LOGS=output_code python aaa.py
I1127 08:51:26.958000 659424 torch/_inductor/graph.py:2054] [0/0] [__output_code] Output code written to: /tmp/torchinductor_ezyang/pb/cpbwwdpm2jadqdp326sytbcrdj3uv7k4vqdvx5r3qzzaqrckr7uy.py

real    0m56.572s
user    0m51.647s
sys     0m11.317s

$ time python aaa.py

real    0m26.027s
user    0m19.065s
sys     0m11.066s

tlparse miss: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmp7rrgy0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100
tlparse hit: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpf5VKld/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100

Although the AOTInductor code takes longer to compile, it is 2x as fast to load on cache hit. Furthermore, because its loading is very simple, it can be directly loaded for a precompile-style workflow.

Signed-off-by: Edward Z. Yang [email protected]

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-11-27T16:48:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141700

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 53 New Failures, 2 Unrelated Failures

As of commit d058fc7 with merge base 0f261e8 ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh)
AssertionError
inductor / unit-test / cuda12.1-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::GPUTests::test_aoti_eager_support_out_cuda
inductor / unit-test / cuda12.1-py3.10-gcc9-sm86 / test (inductor_distributed, 1, 1, linux.g5.12xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCuda::test_non_default_cuda_device_cuda
inductor / unit-test / cuda12.1-py3.10-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_add_complex_cpu
inductor / unit-test / cuda12.1-py3.10-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorLoggingTest::test_shape_env_reuse
inductor / unit-test / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_codegen_cpu
inductor / unit-test / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorLoggingTest::test_shape_env_reuse
inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_amx, 1, 2, linux.8xlarge.amx) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_cache_hit_cpu
inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_amx, 2, 2, linux.8xlarge.amx) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_support_str_cpu
inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx2, 1, 2, linux.10xlarge.avx2) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_dtype_device_layout_cpu
inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_avx2, 2, 2, linux.10xlarge.avx2) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_support_out_cpu
inductor-rocm / rocm6.2-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2) (gh)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_cpu
inductor-rocm / rocm6.2-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_cache_hit_cpu
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_inductor/output_code.py:
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu) (gh)
distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_coalesced
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 1, 5, lf.linux.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 2, 5, lf.linux.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_support_out_cpu
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 3, 5, lf.linux.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_support_str_cpu
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 4, 5, lf.linux.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu
pull / linux-focal-cuda12.4-py3.10-gcc9 / test (default, 5, 5, lf.linux.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cpu
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 1, 5, lf.linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_max_autotune.py::TestMaxAutotune::test_jit_fusion_matches_aot_fusion
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 2, 5, lf.linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_cutlass_backend.py::TestCutlassBackend::test_max_autotune_cutlass_backend_regular_mm_dynamic_False_max_autotune_gemm_backends_ATen,Triton,CUTLASS_use_aoti_True
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 3, 5, lf.linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_cache_hit_cpu
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 4, 5, lf.linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_support_str_cpu
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 5, 5, lf.linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_aot_inductor.py::AOTInductorLoggingTest::test_shape_env_reuse
pull / linux-focal-py3.11-clang10 / test (default, 1, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu
pull / linux-focal-py3.11-clang10 / test (default, 2, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_cache_hit_cpu
pull / linux-focal-py3.11-clang10 / test (default, 3, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_override_registration_dynamic_shapes_cpu
pull / linux-focal-py3.11-clang10 / test (default, 4, 5, lf.linux.4xlarge) (gh)
inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_amp_fallback_random_cpu_with_stack_allocation
pull / linux-focal-py3.11-clang10 / test (default, 5, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_dtype_device_layout_cpu
pull / linux-focal-py3.12-clang10 / test (default, 1, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu
pull / linux-focal-py3.12-clang10 / test (default, 2, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_cache_hit_cpu
pull / linux-focal-py3.12-clang10 / test (default, 3, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_override_registration_dynamic_shapes_cpu
pull / linux-focal-py3.12-clang10 / test (default, 4, 5, lf.linux.4xlarge) (gh)
inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_add_complex_cpu_with_stack_allocation
pull / linux-focal-py3.12-clang10 / test (default, 5, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_dtype_device_layout_cpu
pull / linux-focal-py3.9-clang10 / test (default, 1, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu
pull / linux-focal-py3.9-clang10 / test (default, 2, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_cache_hit_cpu
pull / linux-focal-py3.9-clang10 / test (default, 3, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_override_registration_dynamic_shapes_cpu
pull / linux-focal-py3.9-clang10 / test (default, 4, 5, lf.linux.4xlarge) (gh)
inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_add_complex_cpu_with_stack_allocation
pull / linux-focal-py3.9-clang10 / test (default, 5, 5, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_dtype_device_layout_cpu
pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge) (gh)
extension/llm/modules/test/test_position_embeddings.py::TiledTokenPositionalEmbeddingTest::test_tiled_token_positional_embedding_aoti
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_support_out_cpu
pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_cache_hit_cpu
pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6, lf.linux.4xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, lf.linux.4xlarge) (gh)
inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_add_complex_cpu_with_stack_allocation
pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, lf.linux.4xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_dtype_device_layout_cpu
pull / linux-jammy-py3.10-clang15-asan / test (default, 6, 6, lf.linux.4xlarge) (gh)
inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_aoti_linear_batch_size_16_in_features_128_out_features_64_bias_True_cpu_float32
pull / linux-jammy-py3.9-gcc11 / test (default, 1, 5, lf.linux.2xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu
pull / linux-jammy-py3.9-gcc11 / test (default, 2, 5, lf.linux.2xlarge) (gh)
inductor/test_minifier.py::MinifierTests::test_aoti_cpu_compile_error
pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, lf.linux.2xlarge) (gh)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_support_str_dynamic_shapes_cpu
pull / linux-jammy-py3.9-gcc11 / test (default, 4, 5, lf.linux.2xlarge) (gh)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu
pull / linux-jammy-py3.9-gcc11 / test (default, 5, 5, lf.linux.2xlarge) (gh)
inductor/test_torchinductor.py::CpuTests::test_aoti_eager_dtype_device_layout_cpu

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141703)
convnext_base
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: e88a8d6 Pull Request resolved: #141700

github-actions · 2024-11-27T16:49:34Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

ezyang · 2024-11-27T16:54:58Z

torch/_inductor/codecache.py

        if graph is None:
            return None, cache_info

-        # See _save_graph(); we don't store the callable in the cache entry so


Review this with whitespace changes removed. I should have done this refactor before doing the prototype but I didn't realize I needed to hit this

ezyang · 2024-11-27T16:55:40Z

a.py

+
+
+
+torch._inductor.config.aoti_wrapper = True


This is ABSOLUTELY not what the final API should be, this is enough jank to get the test script to run

ezyang · 2024-11-27T16:56:10Z

torch/_inductor/compile_fx.py

+                    key = get_hash(":".join(compiled_fn), "", "code")
+                    basename, subdir, path = get_path(key, "pt2", "")
+                    pathlib.Path(path).parent.mkdir(parents=True, exist_ok=True)
+                    output_code_log.debug("Package written to %s", path)


TODO: Need to do some CAS refactoring to make this better. package_aoti isn't really cooperating either. This needs to be atomic, not enough guts exposed.

aorenste · 2024-11-27T17:12:20Z

a.py

@@ -0,0 +1,23 @@
+import torch


Do we support @nocommit? I assume this file shouldn't make it into the final PR

This is just a POC, it needs lots of cleaning to actually land, don't worry about it

aorenste · 2024-11-27T17:12:43Z

torch/_inductor/codecache.py

-            lambda: {"filename": artifact_path},
-            payload_fn=lambda: code,
-        )
+        # TODO: This shoujld increment all the time


aorenste · 2024-11-27T17:14:23Z

torch/_inductor/codecache.py

        # TODO: This could be better if we're ever able to serialize compiled
        # models to disk.
-        disk_compiled_graph.current_callable = None
+        disk_compiled_graph.clear_uncacheable()


In #141502 I do a similar change (I called it prepare_for_serialization - but same thing)

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: c10f5b0 Pull Request resolved: #141700

…rch#145381) Summary: Pull Request resolved: pytorch#145381 In this diff we are trying to introduce some stateful API to enable "fullgraph_package" mode which will force inductor to use AOTI as a backend. Different from PR pytorch#141700, we didn't try to populate the package file into caching system, instead we bypass caching to simplify the implementation in the current form. Similar to PR pytorch#141700, I did a quick benchmark to the loading time and it looks like the following: - Precompile ``` buck run mode/opt scripts/zhxchen17:precompile ``` - Load using cache: ``` time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache ``` Output: ``` real 0m24.593s user 0m59.342s sys 0m17.201s ``` - Load using load_fullgraph_package ``` time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile ``` Output: ``` real 0m10.907s user 0m9.210s sys 0m1.173s ``` Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic _function Differential Revision: D68459341

…rch#145381) Summary: In this diff we are trying to introduce some stateful API to enable "fullgraph_package" mode which will force inductor to use AOTI as a backend. Different from PR pytorch#141700, we didn't try to populate the package file into caching system, instead we bypass caching to simplify the implementation in the current form. Similar to PR pytorch#141700, I did a quick benchmark to the loading time and it looks like the following: - Precompile ``` buck run mode/opt scripts/zhxchen17:precompile ``` - Load using cache: ``` time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache ``` Output: ``` real 0m24.593s user 0m59.342s sys 0m17.201s ``` - Load using load_fullgraph_package ``` time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile ``` Output: ``` real 0m10.907s user 0m9.210s sys 0m1.173s ``` Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic _function Differential Revision: D68459341

Summary: Design doc: https://docs.google.com/document/d/1Z15cBBPjoZ7gH00TSgCdgaYko7a7Br-ERd3_hA-g2IU/edit?usp=sharing In this diff we are trying to introduce some stateful API to enable a global mode which will force inductor to use AOTI as a backend. Different from PR #141700, we didn't try to populate the package file into caching system, instead we bypass caching to simplify the implementation in the current form. Similar to PR #141700, I did a quick benchmark to the loading time and it looks like the following: - Precompile ``` buck run mode/opt scripts/zhxchen17:precompile ``` - Load using cache: ``` time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache ``` Output: ``` real 0m24.593s user 0m59.342s sys 0m17.201s ``` - Load using load_fullgraph_package ``` time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile ``` Output: ``` real 0m10.907s user 0m9.210s sys 0m1.173s ``` Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic _function Differential Revision: D68459341

Summary: Design doc: https://docs.google.com/document/d/1Z15cBBPjoZ7gH00TSgCdgaYko7a7Br-ERd3_hA-g2IU/edit?usp=sharing In this diff we are trying to introduce a new API pre torch.compile() object which will force inductor to use AOTI as a backend. Different from PR #141700. Similar to PR #141700, I did a quick benchmark to the loading time and it looks like the following: - Precompile ``` buck run mode/opt scripts/zhxchen17:precompile ``` - Load using cache: ``` time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache ``` Output: ``` real 0m24.593s user 0m59.342s sys 0m17.201s ``` - Load using load_fullgraph_package ``` time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile ``` Output: ``` real 0m10.907s user 0m9.210s sys 0m1.173s ``` Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic _function Differential Revision: D68459341

github-actions · 2025-04-19T20:35:08Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Update

2418914

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 27, 2024

ezyang added a commit that referenced this pull request Nov 27, 2024

[POC] AOTInductor as Inductor backend

1d0b580

Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: e88a8d6 Pull Request resolved: #141700

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, bdhirsh and miladm November 27, 2024 16:48

ezyang requested review from anijain2305, aorenste, desertfire, eellison, jamesjwu, jansel, masnesral and oulgen November 27, 2024 16:54

ezyang commented Nov 27, 2024

View reviewed changes

ezyang requested a review from angelayi November 27, 2024 16:56

aorenste approved these changes Nov 27, 2024

View reviewed changes

ezyang mentioned this pull request Nov 28, 2024

Provide a way to AOT torch.compile and serialize a model #113287

Open

Update

d058fc7

[ghstack-poisoned]

This was referenced Nov 28, 2024

Inline FxGraphCache.load into its sole call site #141681

Closed

Hoist set_feature_use out of conditional, rename some variables #141683

Closed

ezyang added a commit that referenced this pull request Nov 28, 2024

[POC] AOTInductor as Inductor backend

698c698

Signed-off-by: Edward Z. Yang <[email protected]> ghstack-source-id: c10f5b0 Pull Request resolved: #141700

eellison removed their request for review January 16, 2025 19:09

zhxchen17 mentioned this pull request Jan 22, 2025

Use AOTI as inductor backend with precompile mode. #145381

Closed

jansel removed their request for review February 18, 2025 19:47

github-actions bot added the Stale label Apr 19, 2025

github-actions bot closed this May 19, 2025

github-actions bot deleted the gh/ezyang/3031/head branch June 19, 2025 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[POC] AOTInductor as Inductor backend #141700

[POC] AOTInductor as Inductor backend #141700

Uh oh!

ezyang commented Nov 27, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Nov 27, 2024

Uh oh!

ezyang Nov 27, 2024

Uh oh!

ezyang Nov 27, 2024

Uh oh!

ezyang Nov 27, 2024

Uh oh!

aorenste Nov 27, 2024

Uh oh!

ezyang Nov 27, 2024

Uh oh!

aorenste Nov 27, 2024

Uh oh!

aorenste Nov 27, 2024

Uh oh!

github-actions bot commented Apr 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants




		torch._inductor.config.aoti_wrapper = True

[POC] AOTInductor as Inductor backend #141700

[POC] AOTInductor as Inductor backend #141700

Uh oh!

Conversation

ezyang commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141700

❌ 53 New Failures, 2 Unrelated Failures

Uh oh!

github-actions bot commented Nov 27, 2024

This PR needs a release notes: label

Uh oh!

ezyang Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

aorenste Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

aorenste Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

aorenste Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ezyang commented Nov 27, 2024 •

edited

Loading

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading

This PR needs a `release notes:` label