Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Nov 27, 2024

Stack from ghstack (oldest at bottom):

Stacked on #141691

Test transcript:

$ time TORCH_LOGS=output_code python a.py
...
I1127 08:46:55.864000 559792 torch/_inductor/codecache.py:1676] [0/0] [__output_code] Output code written to: /tmp/torchinductor_ezyang/ql/cqll2s5yvh2zgfwi3u7d4ywhbbds7m6tsrw43dja5zn7vxxr4cog.cpp
V1127 08:47:54.104000 559792 torch/_inductor/compile_fx.py:1077] [0/0] [__output_code] Package written to /tmp/torchinductor_ezyang/s7/cs7rsjlshhyvytdibm3i5bsuh6kndxtvxsemjlugjjezlz4426vv.pt2

real    2m38.385s
user    2m26.927s
sys     0m30.435s
$ time TORCH_LOGS=output_code python a.py

real    0m13.905s
user    0m13.348s
sys     0m9.563s

tlparse miss: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpUFFDtO/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100
tlparse hit: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpv0D6hF/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100

For comparison, here is the test script without AOTInductor as backend:

$ time TORCH_LOGS=output_code python aaa.py
I1127 08:51:26.958000 659424 torch/_inductor/graph.py:2054] [0/0] [__output_code] Output code written to: /tmp/torchinductor_ezyang/pb/cpbwwdpm2jadqdp326sytbcrdj3uv7k4vqdvx5r3qzzaqrckr7uy.py

real    0m56.572s
user    0m51.647s
sys     0m11.317s

$ time python aaa.py

real    0m26.027s
user    0m19.065s
sys     0m11.066s

tlparse miss: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmp7rrgy0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100
tlparse hit: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpf5VKld/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100

Although the AOTInductor code takes longer to compile, it is 2x as fast to load on cache hit. Furthermore, because its loading is very simple, it can be directly loaded for a precompile-style workflow.

Signed-off-by: Edward Z. Yang [email protected]

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141700

Note: Links to docs will display an error until the docs builds have been completed.

❌ 53 New Failures, 2 Unrelated Failures

As of commit d058fc7 with merge base 0f261e8 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Nov 27, 2024
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: e88a8d6
Pull Request resolved: #141700
@github-actions
Copy link
Contributor

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

if graph is None:
return None, cache_info

# See _save_graph(); we don't store the callable in the cache entry so
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review this with whitespace changes removed. I should have done this refactor before doing the prototype but I didn't realize I needed to hit this




torch._inductor.config.aoti_wrapper = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ABSOLUTELY not what the final API should be, this is enough jank to get the test script to run

key = get_hash(":".join(compiled_fn), "", "code")
basename, subdir, path = get_path(key, "pt2", "")
pathlib.Path(path).parent.mkdir(parents=True, exist_ok=True)
output_code_log.debug("Package written to %s", path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Need to do some CAS refactoring to make this better. package_aoti isn't really cooperating either. This needs to be atomic, not enough guts exposed.

@ezyang ezyang requested a review from angelayi November 27, 2024 16:56
@@ -0,0 +1,23 @@
import torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support @nocommit? I assume this file shouldn't make it into the final PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a POC, it needs lots of cleaning to actually land, don't worry about it

lambda: {"filename": artifact_path},
payload_fn=lambda: code,
)
# TODO: This shoujld increment all the time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo

# TODO: This could be better if we're ever able to serialize compiled
# models to disk.
disk_compiled_graph.current_callable = None
disk_compiled_graph.clear_uncacheable()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #141502 I do a similar change (I called it prepare_for_serialization - but same thing)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Nov 28, 2024
Signed-off-by: Edward Z. Yang <[email protected]>

ghstack-source-id: c10f5b0
Pull Request resolved: #141700
@eellison eellison removed their request for review January 16, 2025 19:09
zhxchen17 added a commit to zhxchen17/pytorch that referenced this pull request Jan 22, 2025
…rch#145381)

Summary:
Pull Request resolved: pytorch#145381

In this diff we are trying to introduce some stateful API to enable "fullgraph_package" mode which will force inductor to use AOTI as a backend. Different from PR pytorch#141700, we didn't try to populate the package file into caching system, instead we bypass caching to simplify the implementation in the current form.

Similar to PR pytorch#141700, I did a quick benchmark to the loading time and it looks like the following:
- Precompile
```
buck run mode/opt scripts/zhxchen17:precompile
```
- Load using cache:
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache
```
Output:
```
real    0m24.593s
user    0m59.342s
sys     0m17.201s
```
- Load using load_fullgraph_package
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile
```
Output:
```
real    0m10.907s
user    0m9.210s
sys     0m1.173s
```

Test Plan:
buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic
_function

Differential Revision: D68459341
zhxchen17 added a commit to zhxchen17/pytorch that referenced this pull request Jan 22, 2025
…rch#145381)

Summary:

In this diff we are trying to introduce some stateful API to enable "fullgraph_package" mode which will force inductor to use AOTI as a backend. Different from PR pytorch#141700, we didn't try to populate the package file into caching system, instead we bypass caching to simplify the implementation in the current form.

Similar to PR pytorch#141700, I did a quick benchmark to the loading time and it looks like the following:
- Precompile
```
buck run mode/opt scripts/zhxchen17:precompile
```
- Load using cache:
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache
```
Output:
```
real    0m24.593s
user    0m59.342s
sys     0m17.201s
```
- Load using load_fullgraph_package
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile
```
Output:
```
real    0m10.907s
user    0m9.210s
sys     0m1.173s
```

Test Plan:
buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic
_function

Differential Revision: D68459341
zhxchen17 added a commit to zhxchen17/pytorch that referenced this pull request Jan 23, 2025
…rch#145381)

Summary:

In this diff we are trying to introduce some stateful API to enable "fullgraph_package" mode which will force inductor to use AOTI as a backend. Different from PR pytorch#141700, we didn't try to populate the package file into caching system, instead we bypass caching to simplify the implementation in the current form.

Similar to PR pytorch#141700, I did a quick benchmark to the loading time and it looks like the following:
- Precompile
```
buck run mode/opt scripts/zhxchen17:precompile
```
- Load using cache:
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache
```
Output:
```
real    0m24.593s
user    0m59.342s
sys     0m17.201s
```
- Load using load_fullgraph_package
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile
```
Output:
```
real    0m10.907s
user    0m9.210s
sys     0m1.173s
```

Test Plan:
buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic
_function

Differential Revision: D68459341
pytorch-bot bot pushed a commit that referenced this pull request Jan 24, 2025
Summary:

Design doc: https://docs.google.com/document/d/1Z15cBBPjoZ7gH00TSgCdgaYko7a7Br-ERd3_hA-g2IU/edit?usp=sharing

In this diff we are trying to introduce some stateful API to enable a global mode which will force inductor to use AOTI as a backend. Different from PR #141700, we didn't try to populate the package file into caching system, instead we bypass caching to simplify the implementation in the current form.


Similar to PR #141700, I did a quick benchmark to the loading time and it looks like the following:
- Precompile
```
buck run mode/opt scripts/zhxchen17:precompile
```
- Load using cache:
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache
```
Output:
```
real    0m24.593s
user    0m59.342s
sys     0m17.201s
```
- Load using load_fullgraph_package
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile
```
Output:
```
real    0m10.907s
user    0m9.210s
sys     0m1.173s
```

Test Plan:
buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic
_function

Differential Revision: D68459341
@jansel jansel removed their request for review February 18, 2025 19:47
zhxchen17 added a commit that referenced this pull request Feb 20, 2025
Summary:

Design doc: https://docs.google.com/document/d/1Z15cBBPjoZ7gH00TSgCdgaYko7a7Br-ERd3_hA-g2IU/edit?usp=sharing

In this diff we are trying to introduce a new API pre torch.compile() object which will force inductor to use AOTI as a backend. Different from PR #141700.

Similar to PR #141700, I did a quick benchmark to the loading time and it looks like the following:
- Precompile
```
buck run mode/opt scripts/zhxchen17:precompile
```
- Load using cache:
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader cache
```
Output:
```
real    0m24.593s
user    0m59.342s
sys     0m17.201s
```
- Load using load_fullgraph_package
```
time buck run mode/opt scripts/zhxchen17:precompile -- --loader precompile
```
Output:
```
real    0m10.907s
user    0m9.210s
sys     0m1.173s
```

Test Plan:
buck run mode/opt caffe2/test:test_export -- -r test_fullgraph_package_basic
_function

Differential Revision: D68459341
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Apr 19, 2025
@github-actions github-actions bot closed this May 19, 2025
@github-actions github-actions bot deleted the gh/ezyang/3031/head branch June 19, 2025 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants