[inductor] don't match indirect indexing in fusion by ngimel · Pull Request #96273 · pytorch/pytorch

ngimel · 2023-03-08T05:32:05Z

When deciding whether to fuse nodes, we match indexing like c0 + 5 * tmp0, but tmp0 in the different nodes can refer to totally different values. Even when tmp0 is the same (like in the added test) inductor still generates wrongly ordered loads and stores (loads come before stores), so better just disable this fusion altogether. We should fix wrong order also:

@pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
@triton.jit
def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
    xnumel = 5
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:]
    xmask = xindex < xnumel
    x0 = xindex
    tmp0_load = tl.load(in_ptr0 + (0))
    tmp0 = tl.broadcast_to(tmp0_load, [XBLOCK])
    tmp1 = tl.load(in_ptr1 + (x0), xmask)
    tmp2 = tl.load(out_ptr0 + (x0 + (5*tmp0)), xmask)
    tl.store(out_ptr0 + (x0 + (5*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
    tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)

Note: we are loading from out_ptr0 here (that shouldn't happen), we are loading from it before storing to it.
After this PR, the kernel above is split in 2.

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

pytorch-bot · 2023-03-08T05:32:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96273

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 903c6ae:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ngimel · 2023-03-08T18:36:37Z

@pytorhbot merge

ngimel · 2023-03-09T00:42:55Z

@pytorchbot merge

ngimel · 2023-03-09T22:59:55Z

@pytorchbot merge

pytorchmergebot · 2023-03-09T23:03:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes #96064 When deciding whether to fuse nodes, we match indexing like `c0 + 5 * tmp0`, but `tmp0` in the different nodes can refer to totally different values. Even when `tmp0` is the same (like in the added test) inductor still generates wrongly ordered loads and stores (loads come before stores), so better just disable this fusion altogether. We should fix wrong order also: ``` @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) @triton.jit def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): xnumel = 5 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0_load = tl.load(in_ptr0 + (0)) tmp0 = tl.broadcast_to(tmp0_load, [XBLOCK]) tmp1 = tl.load(in_ptr1 + (x0), xmask) tmp2 = tl.load(out_ptr0 + (x0 + (5*tmp0)), xmask) tl.store(out_ptr0 + (x0 + (5*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) ``` Note: we are loading from `out_ptr0` here (that shouldn't happen), we are loading from it before storing to it. After this PR, the kernel above is split in 2. Pull Request resolved: pytorch/pytorch#96273 Approved by: https://github.com/jansel

Fixes pytorch#96064 When deciding whether to fuse nodes, we match indexing like `c0 + 5 * tmp0`, but `tmp0` in the different nodes can refer to totally different values. Even when `tmp0` is the same (like in the added test) inductor still generates wrongly ordered loads and stores (loads come before stores), so better just disable this fusion altogether. We should fix wrong order also: ``` @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]}) @triton.jit def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): xnumel = 5 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0_load = tl.load(in_ptr0 + (0)) tmp0 = tl.broadcast_to(tmp0_load, [XBLOCK]) tmp1 = tl.load(in_ptr1 + (x0), xmask) tmp2 = tl.load(out_ptr0 + (x0 + (5*tmp0)), xmask) tl.store(out_ptr0 + (x0 + (5*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask) tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask) ``` Note: we are loading from `out_ptr0` here (that shouldn't happen), we are loading from it before storing to it. After this PR, the kernel above is split in 2. Pull Request resolved: pytorch#96273 Approved by: https://github.com/jansel

don't match indirect indexing in fusion

bdfad15

ngimel requested review from Chillee and jansel March 8, 2023 05:32

github-actions bot added ciflow/inductor module: inductor labels Mar 8, 2023

ngimel added release notes: inductor topic: bug fixes topic category labels Mar 8, 2023

lint

903c6ae

jansel approved these changes Mar 8, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 9, 2023

pytorchmergebot added the Merged label Mar 9, 2023

pytorchmergebot closed this in 05b679c Mar 9, 2023

ngimel deleted the ngimel/indirect_index branch March 14, 2023 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] don't match indirect indexing in fusion#96273

[inductor] don't match indirect indexing in fusion#96273
ngimel wants to merge 2 commits intomasterfrom
ngimel/indirect_index

ngimel commented Mar 8, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading

Uh oh!

ngimel commented Mar 8, 2023

Uh oh!

ngimel commented Mar 9, 2023

Uh oh!

ngimel commented Mar 9, 2023

Uh oh!

pytorchmergebot commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ngimel commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96273

✅ No Failures

Uh oh!

ngimel commented Mar 8, 2023

Uh oh!

ngimel commented Mar 9, 2023

Uh oh!

ngimel commented Mar 9, 2023

Uh oh!

pytorchmergebot commented Mar 9, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngimel commented Mar 8, 2023 •

edited

Loading

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading