-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Add support for cat memory planning mms with max autotune #132554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132554
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 914f5ab with merge base d1b87e2 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Chillee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at this carefully yet, but will this work with all triton templates? thinking about flexattention here
|
Ugh, it could, but it doesn't right now because I only implemented this for MultiTemplateBuffer and flex_attention has But I wanted to resolve the current, non-max-autotune handling of mms first then can handle it. Like with above, I think returning FlexibleLayout for aten.mm is misleading/buggy. Should not rely on that for cat planning. Options:
|
I don't actually understand this? Isn't |
|
Hmm, maybe, I don't know what would actually happen if you pass cublas a weird output stride. I think you are correct that it would work but we also dont want to pass in a transposed output to a cublas kernel and get a bunch of discontiguous writes. Do we actually want the output strides of mms to be flexible ? |
|
Hmm, at least this was about equal: Similarly for FlexAttention - if we just change the layout to be FlexibleLayout, this would work today, but are you okay with the output strides potentially being non contiguous ? |
Yeah, i think so. Well, I'd definitely want them to be "contiguous enough". |
When we are autotuning matmuls the aten.mm and the triton template choices take in an externally allocated tensor that can be a view into a pre-planned aten.cat. So long as the output shape and stride of the matmul matches the slice of the cat we're planning, we can realize the mm directly into the cat. Discussion for reviewers: It feels a little bit odd that in the existing code we set the output of aten.mm as [FlexibleLayout](https://github.com/pytorch/pytorch/blob/bcac71517c461765b4fa9efccc6f1a5a475c3544/torch/_inductor/kernel/mm.py#L156). While is this correct, it might lead to passing non performant output strides to cublas.. I guess this is better than a copy ? Not sure. We could also introduce a Layout that denotes a Fixed shape and stride which we control allocation ``` class AllocatedFixedLayout(FixedLayout) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
When we are autotuning matmuls the aten.mm and the triton template choices take in an externally allocated tensor that can be a view into a pre-planned aten.cat. So long as the output shape and stride of the matmul matches the slice of the cat we're planning, we can realize the mm directly into the cat. Discussion for reviewers: It feels a little bit odd that in the existing code we set the output of aten.mm as [FlexibleLayout](https://github.com/pytorch/pytorch/blob/bcac71517c461765b4fa9efccc6f1a5a475c3544/torch/_inductor/kernel/mm.py#L156). While is this correct, it might lead to passing non performant output strides to cublas.. I guess this is better than a copy ? Not sure. We could also introduce a Layout that denotes a Fixed shape and stride which we control allocation ``` class AllocatedFixedLayout(FixedLayout) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_avx2_timm, 2, 2, lf.linux.10xlarge.avx2) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m 'Sorry for reverting your change but I think it is failing on ROCm' -c nosignal inductor/test_max_autotune.py::TestMaxAutotune::test_conv_cat GH job link HUD commit link |
|
@pytorchbot successfully started a revert job. Check the current status here. |
…32554)" This reverts commit d558ec0. Reverted #132554 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think it is failing on ROCm ([comment](#132554 (comment)))
|
@eellison your PR has been successfully reverted. |
When we are autotuning matmuls the aten.mm and the triton template choices take in an externally allocated tensor that can be a view into a pre-planned aten.cat. So long as the output shape and stride of the matmul matches the slice of the cat we're planning, we can realize the mm directly into the cat. Discussion for reviewers: It feels a little bit odd that in the existing code we set the output of aten.mm as [FlexibleLayout](https://github.com/pytorch/pytorch/blob/bcac71517c461765b4fa9efccc6f1a5a475c3544/torch/_inductor/kernel/mm.py#L156). While is this correct, it might lead to passing non performant output strides to cublas.. I guess this is better than a copy ? Not sure. We could also introduce a Layout that denotes a Fixed shape and stride which we control allocation ``` class AllocatedFixedLayout(FixedLayout) ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
When we are autotuning matmuls the aten.mm and the triton template choices take in an externally allocated tensor that can be a view into a pre-planned aten.cat. So long as the output shape and stride of the matmul matches the slice of the cat we're planning, we can realize the mm directly into the cat.
Discussion for reviewers:
It feels a little bit odd that in the existing code we set the output of aten.mm as FlexibleLayout. While is this correct, it might lead to passing non performant output strides to cublas.. I guess this is better than a copy ? Not sure. We could also introduce a Layout that denotes a Fixed shape and stride which we control allocation
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang