Skip to content

Conversation

@desertfire
Copy link
Contributor

@desertfire desertfire commented Dec 12, 2024

Summary: There is a grid computation issue after switching to one-pass codegen in #141980. When max-autotune is turned on, there is an incorrect grid codegen in some cases.

Reviewed By: henrylhtsang

Differential Revision: D67120987

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143098

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 8eacd41 with merge base 84f7913 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67120987

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 12, 2024
Summary:

There is a grid computation issue after switching to one-pass codegen in #141980. When max-autotune is turned on, there is an incorrect grid codegen in some cases.

Reviewed By: henrylhtsang

Differential Revision: D67120987
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67120987

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

bluenote10 pushed a commit to bluenote10/pytorch that referenced this pull request Dec 14, 2024
Summary: There is a grid computation issue after switching to one-pass codegen in pytorch#141980. When max-autotune is turned on, there is an incorrect grid codegen in some cases.

Reviewed By: henrylhtsang

Differential Revision: D67120987

Pull Request resolved: pytorch#143098
Approved by: https://github.com/henrylhtsang
@bhack
Copy link
Contributor

bhack commented Dec 16, 2024

@desertfire I don't know if it is connected. I have solved with this PR the compile+aoti export with autotuning but I still have this issue on L40S GPU:

E1216 site-packages/torch/_inductor/select_algorithm.py:1756] [0/0] Exception out of resource: shared memory, Required: 131072, Hardware limit:101376. Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/torchinductor_root/tc/.....py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=True, GROUP_M=8, num_stages=5, num_warps=8)
W1216 site-packages/torch/_inductor/select_algorithm.py:1997] [0/0] out of resource: shared memory, Required: 131072, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants