Skip to content

Conversation

@bohnstingl
Copy link
Collaborator

@bohnstingl bohnstingl commented Aug 8, 2024

This is part of a series of PRs to improve the functionality of the associatve_scan functionality. This specific PR introduces a combine_mode, which can be either pointwise (default) or generic. In case of generic, the associative_scan is more flexible and allows also to perform non-pointwise functions. This PR has been derived from #129307.

@ydwu4 @Chillee @zou3519

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133012

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6d1d698 with merge base bb22132 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zou3519 zou3519 requested review from Chillee, ydwu4 and zou3519 August 9, 2024 12:31
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 9, 2024
@unittest.skipIf(not torch.cuda.is_available(), "Test requires CUDA.")
@parametrize("reverse", [False, True])
@parametrize("combine_mode", ["pointwise", "generic"])
@parametrize("device", [torch.device("cuda")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the device argument necessary? Can we delete it? Same for other tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the device argument should test CPU and CUDA tensors. I updated the testcases to reflect this. In case of combine_mode=pointwise and the CPU device, the test is "skipped".

Copy link
Contributor

@ydwu4 ydwu4 Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you figure out why combine_mode=pointwise x CPU fails? Doesn't need to solve it in this PR. Maybe instead of skip, we xfail it with a proper reason.

Copy link
Collaborator Author

@bohnstingl bohnstingl Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my understanding was that because of the lowering to triton only the CUDA device is supported? I double-checked and this seems to be the case.

Regarding the xfail: I don’t know how to properly do the xfail. My specific problem is that only a subset of all the parameters of a test fail, e.g., CPU x pointwise. Is there an example that I can look at?
I tried: xfail_inherited_tests, xfail without much success

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, if only cuda is supported for "pointwise", how the first a few runs are OK? We could list it out as a new test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've marked the combine_mode='pointwise' x CPU testcases as skipped and added a comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that ROCm is also failing these pointwise tests, even though we are using cuda device. Any ideas here?

class AssociativeScanTests(TestCase):
@requires_gpu
@parametrize("device", [torch.device("cuda")])
@parametrize("combine_mode", ["generic"])
Copy link
Contributor

@ydwu4 ydwu4 Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can add "pointwise" to combine_mode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this can be done. I extended the testcase. However, this is the test that currently fails with the weird behavior of the flip operation that I mentioned.

Copy link
Contributor

@ydwu4 ydwu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Wait for ci

@ydwu4 ydwu4 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 23, 2024
@bohnstingl bohnstingl requested a review from ydwu4 August 30, 2024 15:31
@ydwu4
Copy link
Contributor

ydwu4 commented Aug 30, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@ydwu4 ydwu4 added the topic: not user facing topic category label Aug 30, 2024
@ydwu4
Copy link
Contributor

ydwu4 commented Aug 30, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jataylo
Copy link
Collaborator

jataylo commented Sep 12, 2024

Hey @bohnstingl @ydwu4 @Chillee @zou3519 cc: @jeffdaily This change seems to have broken some ROCm tests, could you help us pinpoint what may be the issue here. Or some pointers on how we can debug this.

Hud link:
https://hud.pytorch.org/failure?name=rocm%20%2F%20linux-focal-rocm6.1-py3.8%20%2F%20test%20(default%2C%205%2C%206%2C%20linux.rocm.gpu.2)&jobName=undefined&failureCaptures=%5B%22functorch%2Ftest_control_flow.py%3A%3ATestControlFlow%3A%3Atest_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda%22%5D

Snippet:

    raise LoweringException(e, target, args, kwargs).with_traceback(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped
    out = decomp_fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan
    raise RuntimeError("Unable to generate code for associative_scan op")
torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op
  target: associative_scan
  args[0]: Subgraph(name='scan_combine_graph_0', graph_module=<lambda>(), graph=None)
  args[1]: [TensorBox(StorageBox(
    Pointwise(
      'cuda',
      torch.float32,
      def inner_fn(index):
          i0, i1, i2 = index
          tmp0 = ops.load(primals_1, 8 + i2 + -4 * i0 + 2 * i1)
          return tmp0
      ,
      ranges=[3, 2, 2],
      origin_node=rev,
      origins=OrderedSet([rev])
    )
  )), TensorBox(StorageBox(
    Pointwise(
      'cuda',
      torch.float32,
      def inner_fn(index):
          i0, i1, i2 = index
          tmp0 = ops.load(primals_2, 8 + i2 + -4 * i0 + 2 * i1)
          return tmp0
      ,
      ranges=[3, 2, 2],
      origin_node=rev_1,
      origins=OrderedSet([rev_1])
    )
  ))]
  args[2]: **0**

After running this locally I can see that we are only failing the pointwise combine_fn:

ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda 
ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda 
ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda 

@bohnstingl
Copy link
Collaborator Author

Hi @jataylo,
Could you maybe provide the full error log?

On my local machine the test are running fine. However, on a different note, me and @ydwu4 are currently working on a related feature to the associative_scan, the scan operator. In that same PR we are updating those tests as well and can look at that.

@ydwu4
Copy link
Contributor

ydwu4 commented Sep 12, 2024

We probably can skip ROCM tests on the associative_scan tests. From the error log, ir.Scan.Create in the lowering logic returns None. Seems like a trition x rocm issue or something.

@jataylo
Copy link
Collaborator

jataylo commented Sep 13, 2024

Hey @bohnstingl , @ydwu4 full error log here https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445

I don't see this getting to any triton lowering before failing, seems like this is moreso a pytorch logic issue rather than a triton issue. We were passing scan UTs before this change too so would like to figure out what is going on.

EDIT: this also fails with the eager compile backend.

  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense
    raise NotImplementedError("associative_scan is not implemented for eager")
NotImplementedError: associative_scan is not implemented for eager

pytorchmergebot pushed a commit that referenced this pull request Sep 14, 2024
#133012 caused a regression on ROCm causing pointwise scan tests to fail

```
ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda
```

Skipping temporarily while triage is underway.

Full log: https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445

```
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped
    out = decomp_fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan
    raise RuntimeError("Unable to generate code for associative_scan op")
torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op
```

NOTE: even "eager" backend fails
```
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense
    raise NotImplementedError("associative_scan is not implemented for eager")
NotImplementedError: associative_scan is not implemented for eager
```

Pull Request resolved: #135995
Approved by: https://github.com/malfet
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
This is part of a series of PRs to improve the functionality of the `associatve_scan` functionality. This specific PR introduces a `combine_mode`, which can be either `pointwise` (default) or `generic`. In case of `generic`, the `associative_scan` is more flexible and allows also to perform non-pointwise functions. This PR has been derived from pytorch#129307.

@ydwu4 @Chillee @zou3519

Pull Request resolved: pytorch#133012
Approved by: https://github.com/ydwu4
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
…ch#135995)

pytorch#133012 caused a regression on ROCm causing pointwise scan tests to fail

```
ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda
```

Skipping temporarily while triage is underway.

Full log: https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445

```
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped
    out = decomp_fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan
    raise RuntimeError("Unable to generate code for associative_scan op")
torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op
```

NOTE: even "eager" backend fails
```
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense
    raise NotImplementedError("associative_scan is not implemented for eager")
NotImplementedError: associative_scan is not implemented for eager
```

Pull Request resolved: pytorch#135995
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo module: inductor open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants