Skip to content

Conversation

@SamGinzburg
Copy link
Contributor

@SamGinzburg SamGinzburg commented Oct 17, 2024

This is a PR to fix the following issue: #115344

In short, passing None as an arg. to a Triton kernel would cause problems:

Short repro of the specific bug fixed here:

import triton
import triton.language as tl

@triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
    configs=[
        triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2),
        triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4),
    ],
    key=['n_elements']
)
@triton.jit # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
def sin_kernel( # E: Function is missing a return type annotation  [no-untyped-def]
    in_ptr0,
    out_ptr,
    n_elements,
    BLOCK_SIZE: "tl.constexpr",
):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    if in_ptr0 is not None:
        x = tl.load(in_ptr0 + offsets, mask=mask)
    else:
        x = 0.
    output = tl.sin(x)
    tl.store(out_ptr + offsets, output, mask=mask)

import torch

def sin_triton(x, out):
    n_elements = out.numel()
    sin_kernel[(n_elements,)](x, out, n_elements)

x = torch.randn(65, device="cuda")
out = torch.empty_like(x)
out_compiled = torch.empty_like(x)

sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton)

for first in (x, None):
    sin_triton(first, out)
    sin_triton_compiled(first, out_compiled)
    torch.testing.assert_close(out, out_compiled)

I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py"

topic: not user facing

================================================================================
Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Differential Revision: D64615061

add test

fewer failing tests

more tests passing

tests passing

lint

[ghstack-poisoned]
@SamGinzburg SamGinzburg requested a review from zou3519 as a code owner October 17, 2024 21:06
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138260

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 2 Unrelated Failures

As of commit cbc1bc0 with merge base de51ed8 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 17, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

This is a PR to fix the following issue: #115344

In short, passing None as an arg. to a Triton kernel would cause problems:

Short repro of the specific bug fixed here:
```
import triton
import triton.language as tl

triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
    configs=[
        triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2),
        triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4),
    ],
    key=['n_elements']
)
triton.jit # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
def sin_kernel( # E: Function is missing a return type annotation  [no-untyped-def]
    in_ptr0,
    out_ptr,
    n_elements,
    BLOCK_SIZE: "tl.constexpr",
):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    if in_ptr0 is not None:
        x = tl.load(in_ptr0 + offsets, mask=mask)
    else:
        x = 0.
    output = tl.sin(x)
    tl.store(out_ptr + offsets, output, mask=mask)

import torch

def sin_triton(x, out):
    n_elements = out.numel()
    sin_kernel[(n_elements,)](x, out, n_elements)

x = torch.randn(65, device="cuda")
out = torch.empty_like(x)
out_compiled = torch.empty_like(x)

sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton)

for first in (x, None):
    sin_triton(first, out)
    sin_triton_compiled(first, out_compiled)
    torch.testing.assert_close(out, out_compiled)
```
I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py"


================================================================================


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov 

[ghstack-poisoned]
@SamGinzburg
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 17, 2024
This is a PR to fix the following issue: #115344

In short, passing None as an arg. to a Triton kernel would cause problems:

Short repro of the specific bug fixed here:
```
import triton
import triton.language as tl

triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
    configs=[
        triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2),
        triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4),
    ],
    key=['n_elements']
)
triton.jit # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
def sin_kernel( # E: Function is missing a return type annotation  [no-untyped-def]
    in_ptr0,
    out_ptr,
    n_elements,
    BLOCK_SIZE: "tl.constexpr",
):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    if in_ptr0 is not None:
        x = tl.load(in_ptr0 + offsets, mask=mask)
    else:
        x = 0.
    output = tl.sin(x)
    tl.store(out_ptr + offsets, output, mask=mask)

import torch

def sin_triton(x, out):
    n_elements = out.numel()
    sin_kernel[(n_elements,)](x, out, n_elements)

x = torch.randn(65, device="cuda")
out = torch.empty_like(x)
out_compiled = torch.empty_like(x)

sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton)

for first in (x, None):
    sin_triton(first, out)
    sin_triton_compiled(first, out_compiled)
    torch.testing.assert_close(out, out_compiled)
```
I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py"


topic: not user facing

================================================================================


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov 

[ghstack-poisoned]
This is a PR to fix the following issue: #115344

In short, passing None as an arg. to a Triton kernel would cause problems:

Short repro of the specific bug fixed here:
```
import triton
import triton.language as tl

triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
    configs=[
        triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2),
        triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4),
    ],
    key=['n_elements']
)
triton.jit # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
def sin_kernel( # E: Function is missing a return type annotation  [no-untyped-def]
    in_ptr0,
    out_ptr,
    n_elements,
    BLOCK_SIZE: "tl.constexpr",
):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    if in_ptr0 is not None:
        x = tl.load(in_ptr0 + offsets, mask=mask)
    else:
        x = 0.
    output = tl.sin(x)
    tl.store(out_ptr + offsets, output, mask=mask)

import torch

def sin_triton(x, out):
    n_elements = out.numel()
    sin_kernel[(n_elements,)](x, out, n_elements)

x = torch.randn(65, device="cuda")
out = torch.empty_like(x)
out_compiled = torch.empty_like(x)

sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton)

for first in (x, None):
    sin_triton(first, out)
    sin_triton_compiled(first, out_compiled)
    torch.testing.assert_close(out, out_compiled)
```
I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py"


topic: not user facing

================================================================================


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov 

[ghstack-poisoned]
This is a PR to fix the following issue: #115344

In short, passing None as an arg. to a Triton kernel would cause problems:

Short repro of the specific bug fixed here:
```
import triton
import triton.language as tl

triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
    configs=[
        triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2),
        triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4),
    ],
    key=['n_elements']
)
triton.jit # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
def sin_kernel( # E: Function is missing a return type annotation  [no-untyped-def]
    in_ptr0,
    out_ptr,
    n_elements,
    BLOCK_SIZE: "tl.constexpr",
):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    if in_ptr0 is not None:
        x = tl.load(in_ptr0 + offsets, mask=mask)
    else:
        x = 0.
    output = tl.sin(x)
    tl.store(out_ptr + offsets, output, mask=mask)

import torch

def sin_triton(x, out):
    n_elements = out.numel()
    sin_kernel[(n_elements,)](x, out, n_elements)

x = torch.randn(65, device="cuda")
out = torch.empty_like(x)
out_compiled = torch.empty_like(x)

sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton)

for first in (x, None):
    sin_triton(first, out)
    sin_triton_compiled(first, out_compiled)
    torch.testing.assert_close(out, out_compiled)
```
I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py"


topic: not user facing

================================================================================


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov 

[ghstack-poisoned]
SamGinzburg added a commit that referenced this pull request Oct 17, 2024
add test

fewer failing tests

more tests passing

tests passing

lint

ghstack-source-id: f089650
Pull Request resolved: #138260
Copy link
Contributor

@aakhundov aakhundov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @SamGinzburg! Looks good overall. Added some minor comments and questions.

@aakhundov
Copy link
Contributor

cc @zou3519 as this fixes our long-standing None arg issue :)

This is a PR to fix the following issue: #115344

In short, passing None as an arg. to a Triton kernel would cause problems:

Short repro of the specific bug fixed here:
```
import triton
import triton.language as tl

triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
    configs=[
        triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2),
        triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4),
    ],
    key=['n_elements']
)
triton.jit # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
def sin_kernel( # E: Function is missing a return type annotation  [no-untyped-def]
    in_ptr0,
    out_ptr,
    n_elements,
    BLOCK_SIZE: "tl.constexpr",
):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    if in_ptr0 is not None:
        x = tl.load(in_ptr0 + offsets, mask=mask)
    else:
        x = 0.
    output = tl.sin(x)
    tl.store(out_ptr + offsets, output, mask=mask)

import torch

def sin_triton(x, out):
    n_elements = out.numel()
    sin_kernel[(n_elements,)](x, out, n_elements)

x = torch.randn(65, device="cuda")
out = torch.empty_like(x)
out_compiled = torch.empty_like(x)

sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton)

for first in (x, None):
    sin_triton(first, out)
    sin_triton_compiled(first, out_compiled)
    torch.testing.assert_close(out, out_compiled)
```
I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py"


topic: not user facing

================================================================================


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov 

[ghstack-poisoned]
SamGinzburg added a commit that referenced this pull request Oct 18, 2024
add test

fewer failing tests

more tests passing

tests passing

lint

ghstack-source-id: 749f8ff
Pull Request resolved: #138260
@SamGinzburg
Copy link
Contributor Author

@SamGinzburg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@aakhundov aakhundov requested a review from eellison October 18, 2024 19:05
@aakhundov aakhundov requested a review from oulgen October 18, 2024 19:06
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 18, 2024
This is a PR to fix the following issue: #115344

In short, passing None as an arg. to a Triton kernel would cause problems:

Short repro of the specific bug fixed here:
```
import triton
import triton.language as tl

triton.autotune( # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
    configs=[
        triton.Config({'BLOCK_SIZE': 32}, num_stages=5, num_warps=2),
        triton.Config({'BLOCK_SIZE': 64}, num_stages=4, num_warps=4),
    ],
    key=['n_elements']
)
triton.jit # E: Untyped decorator makes function "sin_kernel" untyped  [misc]
def sin_kernel( # E: Function is missing a return type annotation  [no-untyped-def]
    in_ptr0,
    out_ptr,
    n_elements,
    BLOCK_SIZE: "tl.constexpr",
):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    if in_ptr0 is not None:
        x = tl.load(in_ptr0 + offsets, mask=mask)
    else:
        x = 0.
    output = tl.sin(x)
    tl.store(out_ptr + offsets, output, mask=mask)

import torch

def sin_triton(x, out):
    n_elements = out.numel()
    sin_kernel[(n_elements,)](x, out, n_elements)

x = torch.randn(65, device="cuda")
out = torch.empty_like(x)
out_compiled = torch.empty_like(x)

sin_triton_compiled = torch.compile(fullgraph=True)(sin_triton)

for first in (x, None):
    sin_triton(first, out)
    sin_triton_compiled(first, out_compiled)
    torch.testing.assert_close(out, out_compiled)
```
I've added a unit test to catch this issue in the future and the tests in "test/inductor/test_triton_kernels.py"


topic: not user facing

================================================================================


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

Differential Revision: [D64615061](https://our.internmc.facebook.com/intern/diff/D64615061)

[ghstack-poisoned]
SamGinzburg added a commit that referenced this pull request Oct 18, 2024
add test

fewer failing tests

more tests passing

tests passing

lint

ghstack-source-id: 4cb2b4f
Pull Request resolved: #138260
@SamGinzburg
Copy link
Contributor Author

@SamGinzburg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@SamGinzburg SamGinzburg merged commit d74f5bd into gh/SamGinzburg/5/base Oct 21, 2024
@SamGinzburg
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Can't merge closed PR #138260

SamGinzburg added a commit that referenced this pull request Oct 21, 2024
…138469)

Revert "Bugfix for passing None args to user defined Triton kernel (#138260)"

This reverts commit d74f5bd.
@aakhundov
Copy link
Contributor

This PR was merged manually by mistake. Resubmitted in #138472.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants