-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Bugfix for passing None args to user defined Triton kernel #138260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
add test fewer failing tests more tests passing tests passing lint [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138260
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 2 Unrelated FailuresAs of commit cbc1bc0 with merge base de51ed8 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This is a PR to fix the following issue: #115344 In short, passing None as an arg. to a Triton kernel would cause problems: Short repro of the specific bug fixed here: ``` import triton import triton.language as tl triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped [misc] configs=[ triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2), triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4), ], key=['n_elements'] ) triton.jit # E: Untyped decorator makes function "sin_kernel" untyped [misc] def sin_kernel( # E: Function is missing a return type annotation [no-untyped-def] in_ptr0, out_ptr, n_elements, BLOCK_SIZE: "tl.constexpr", ): pid = tl.program_id(axis=0) block_start = pid * BLOCK_SIZE offsets = block_start + tl.arange(0, BLOCK_SIZE) mask = offsets < n_elements if in_ptr0 is not None: x = tl.load(in_ptr0 + offsets, mask=mask) else: x = 0. output = tl.sin(x) tl.store(out_ptr + offsets, output, mask=mask) import torch def sin_triton(x, out): n_elements = out.numel() sin_kernel[(n_elements,)](x, out, n_elements) x = torch.randn(65, device="cuda") out = torch.empty_like(x) out_compiled = torch.empty_like(x) sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton) for first in (x, None): sin_triton(first, out) sin_triton_compiled(first, out_compiled) torch.testing.assert_close(out, out_compiled) ``` I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py" ================================================================================ cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
|
@pytorchbot label "topic: not user facing" |
This is a PR to fix the following issue: #115344 In short, passing None as an arg. to a Triton kernel would cause problems: Short repro of the specific bug fixed here: ``` import triton import triton.language as tl triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped [misc] configs=[ triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2), triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4), ], key=['n_elements'] ) triton.jit # E: Untyped decorator makes function "sin_kernel" untyped [misc] def sin_kernel( # E: Function is missing a return type annotation [no-untyped-def] in_ptr0, out_ptr, n_elements, BLOCK_SIZE: "tl.constexpr", ): pid = tl.program_id(axis=0) block_start = pid * BLOCK_SIZE offsets = block_start + tl.arange(0, BLOCK_SIZE) mask = offsets < n_elements if in_ptr0 is not None: x = tl.load(in_ptr0 + offsets, mask=mask) else: x = 0. output = tl.sin(x) tl.store(out_ptr + offsets, output, mask=mask) import torch def sin_triton(x, out): n_elements = out.numel() sin_kernel[(n_elements,)](x, out, n_elements) x = torch.randn(65, device="cuda") out = torch.empty_like(x) out_compiled = torch.empty_like(x) sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton) for first in (x, None): sin_triton(first, out) sin_triton_compiled(first, out_compiled) torch.testing.assert_close(out, out_compiled) ``` I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py" topic: not user facing ================================================================================ cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
This is a PR to fix the following issue: #115344 In short, passing None as an arg. to a Triton kernel would cause problems: Short repro of the specific bug fixed here: ``` import triton import triton.language as tl triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped [misc] configs=[ triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2), triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4), ], key=['n_elements'] ) triton.jit # E: Untyped decorator makes function "sin_kernel" untyped [misc] def sin_kernel( # E: Function is missing a return type annotation [no-untyped-def] in_ptr0, out_ptr, n_elements, BLOCK_SIZE: "tl.constexpr", ): pid = tl.program_id(axis=0) block_start = pid * BLOCK_SIZE offsets = block_start + tl.arange(0, BLOCK_SIZE) mask = offsets < n_elements if in_ptr0 is not None: x = tl.load(in_ptr0 + offsets, mask=mask) else: x = 0. output = tl.sin(x) tl.store(out_ptr + offsets, output, mask=mask) import torch def sin_triton(x, out): n_elements = out.numel() sin_kernel[(n_elements,)](x, out, n_elements) x = torch.randn(65, device="cuda") out = torch.empty_like(x) out_compiled = torch.empty_like(x) sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton) for first in (x, None): sin_triton(first, out) sin_triton_compiled(first, out_compiled) torch.testing.assert_close(out, out_compiled) ``` I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py" topic: not user facing ================================================================================ cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
This is a PR to fix the following issue: #115344 In short, passing None as an arg. to a Triton kernel would cause problems: Short repro of the specific bug fixed here: ``` import triton import triton.language as tl triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped [misc] configs=[ triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2), triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4), ], key=['n_elements'] ) triton.jit # E: Untyped decorator makes function "sin_kernel" untyped [misc] def sin_kernel( # E: Function is missing a return type annotation [no-untyped-def] in_ptr0, out_ptr, n_elements, BLOCK_SIZE: "tl.constexpr", ): pid = tl.program_id(axis=0) block_start = pid * BLOCK_SIZE offsets = block_start + tl.arange(0, BLOCK_SIZE) mask = offsets < n_elements if in_ptr0 is not None: x = tl.load(in_ptr0 + offsets, mask=mask) else: x = 0. output = tl.sin(x) tl.store(out_ptr + offsets, output, mask=mask) import torch def sin_triton(x, out): n_elements = out.numel() sin_kernel[(n_elements,)](x, out, n_elements) x = torch.randn(65, device="cuda") out = torch.empty_like(x) out_compiled = torch.empty_like(x) sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton) for first in (x, None): sin_triton(first, out) sin_triton_compiled(first, out_compiled) torch.testing.assert_close(out, out_compiled) ``` I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py" topic: not user facing ================================================================================ cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
aakhundov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @SamGinzburg! Looks good overall. Added some minor comments and questions.
|
cc @zou3519 as this fixes our long-standing |
This is a PR to fix the following issue: #115344 In short, passing None as an arg. to a Triton kernel would cause problems: Short repro of the specific bug fixed here: ``` import triton import triton.language as tl triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped [misc] configs=[ triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2), triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4), ], key=['n_elements'] ) triton.jit # E: Untyped decorator makes function "sin_kernel" untyped [misc] def sin_kernel( # E: Function is missing a return type annotation [no-untyped-def] in_ptr0, out_ptr, n_elements, BLOCK_SIZE: "tl.constexpr", ): pid = tl.program_id(axis=0) block_start = pid * BLOCK_SIZE offsets = block_start + tl.arange(0, BLOCK_SIZE) mask = offsets < n_elements if in_ptr0 is not None: x = tl.load(in_ptr0 + offsets, mask=mask) else: x = 0. output = tl.sin(x) tl.store(out_ptr + offsets, output, mask=mask) import torch def sin_triton(x, out): n_elements = out.numel() sin_kernel[(n_elements,)](x, out, n_elements) x = torch.randn(65, device="cuda") out = torch.empty_like(x) out_compiled = torch.empty_like(x) sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton) for first in (x, None): sin_triton(first, out) sin_triton_compiled(first, out_compiled) torch.testing.assert_close(out, out_compiled) ``` I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py" topic: not user facing ================================================================================ cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
|
@SamGinzburg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This is a PR to fix the following issue: #115344 In short, passing None as an arg. to a Triton kernel would cause problems: Short repro of the specific bug fixed here: ``` import triton import triton.language as tl triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped [misc] configs=[ triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2), triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4), ], key=['n_elements'] ) triton.jit # E: Untyped decorator makes function "sin_kernel" untyped [misc] def sin_kernel( # E: Function is missing a return type annotation [no-untyped-def] in_ptr0, out_ptr, n_elements, BLOCK_SIZE: "tl.constexpr", ): pid = tl.program_id(axis=0) block_start = pid * BLOCK_SIZE offsets = block_start + tl.arange(0, BLOCK_SIZE) mask = offsets < n_elements if in_ptr0 is not None: x = tl.load(in_ptr0 + offsets, mask=mask) else: x = 0. output = tl.sin(x) tl.store(out_ptr + offsets, output, mask=mask) import torch def sin_triton(x, out): n_elements = out.numel() sin_kernel[(n_elements,)](x, out, n_elements) x = torch.randn(65, device="cuda") out = torch.empty_like(x) out_compiled = torch.empty_like(x) sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton) for first in (x, None): sin_triton(first, out) sin_triton_compiled(first, out_compiled) torch.testing.assert_close(out, out_compiled) ``` I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py" topic: not user facing ================================================================================ cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov Differential Revision: [D64615061](https://our.internmc.facebook.com/intern/diff/D64615061) [ghstack-poisoned]
|
@SamGinzburg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@pytorchbot merge |
|
Can't merge closed PR #138260 |
|
This PR was merged manually by mistake. Resubmitted in #138472. |
This is a PR to fix the following issue: #115344
In short, passing None as an arg. to a Triton kernel would cause problems:
Short repro of the specific bug fixed here:
I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py"
topic: not user facing
================================================================================
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov
Differential Revision: D64615061