Skip to content

[triton 3.2] inductor cumsum w/ upstream triton fails accuracy and device asserts #139348

@davidberard98

Description

@davidberard98

🐛 Describe the bug

Repro:

import torch

def fn(x):
    return x.cumsum(0)

x = torch.rand(100, 4000, device='cuda')

breakpoint()
expect = fn(x)
actual = torch.compile(fn)(x)

torch.testing.assert_allclose(expect, actual)

Versions

5c6d354 pytorch, 018c139d2b843c29c3b4d4d2e6f3e672b8ff0b3a triton. H100.

cc @ezyang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov @bertmaher @int3 @nmacchioni @embg @peterbell10

Metadata

Metadata

Assignees

Labels

module: inductoroncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleupstream tritonUpstream Triton Issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions