-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Inductor] don't set XBLOCK larger than xnumel #138730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
Chillee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I don't really understand all of our scaling up/down code, and I've run into a couple bugs with it before.
Is this the right way to solve it or just a patch?
The stacked PR fails some internal tests: https://www.internalfb.com/diff/D64796587 . Those tests exists in OSS but we don't run it in OSS CI since we don't have h100... This PR is a patch for quick fix those test failures. A fundamental fix may need us to auditing all the scaling rules for BLOCK size and all kind of overriding (like min_elem_per_thread). |
Chillee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open an issue to track refactoring our scaling logic here? I think it's pretty messy.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot merge -i |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot merge -i |
|
@pytorchbot merge -f '..' |
|
You need to provide a reason for using force merge, in the format @pytorchbot merge -f 'Explanation'.
|
|
@pytorchbot merge -f "merge -i does not work" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
When fp8 dtype is involved, Inductor may set min_elem_per_thread to be a positive value. This will force increasing XBLOCK even for a small xnumel (e.g. 1). Inductor will report an error later when sanity check the triton config.
The simple fix here is to just not let XBLOCK to be larger than xnumel.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov