Improvements for associative_scan - combine_mode #133012

bohnstingl · 2024-08-08T16:32:31Z

This is part of a series of PRs to improve the functionality of the associatve_scan functionality. This specific PR introduces a combine_mode, which can be either pointwise (default) or generic. In case of generic, the associative_scan is more flexible and allows also to perform non-pointwise functions. This PR has been derived from #129307.

@ydwu4 @Chillee @zou3519

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec

…torch into generic_associative_scan_2

pytorch-bot · 2024-08-08T16:32:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133012

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6d1d698 with merge base bb22132 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ative_scan_2

test/functorch/test_control_flow.py

ydwu4 · 2024-08-20T16:29:53Z

test/functorch/test_control_flow.py

+    @unittest.skipIf(not torch.cuda.is_available(), "Test requires CUDA.")
+    @parametrize("reverse", [False, True])
+    @parametrize("combine_mode", ["pointwise", "generic"])
+    @parametrize("device", [torch.device("cuda")])


Is the device argument necessary? Can we delete it? Same for other tests.

Yes, the device argument should test CPU and CUDA tensors. I updated the testcases to reflect this. In case of combine_mode=pointwise and the CPU device, the test is "skipped".

Can you figure out why combine_mode=pointwise x CPU fails? Doesn't need to solve it in this PR. Maybe instead of skip, we xfail it with a proper reason.

Well, my understanding was that because of the lowering to triton only the CUDA device is supported? I double-checked and this seems to be the case.

Regarding the xfail: I don’t know how to properly do the xfail. My specific problem is that only a subset of all the parameters of a test fail, e.g., CPU x pointwise. Is there an example that I can look at?
I tried: xfail_inherited_tests, xfail without much success

Wait, if only cuda is supported for "pointwise", how the first a few runs are OK? We could list it out as a new test.

I've marked the combine_mode='pointwise' x CPU testcases as skipped and added a comment.

It seems that ROCm is also failing these pointwise tests, even though we are using cuda device. Any ideas here?

ydwu4 · 2024-08-20T16:37:48Z

test/inductor/test_control_flow.py

 class AssociativeScanTests(TestCase):
    @requires_gpu
-    @parametrize("device", [torch.device("cuda")])
+    @parametrize("combine_mode", ["generic"])


can add "pointwise" to combine_mode?

Sure, this can be done. I extended the testcase. However, this is the test that currently fails with the weird behavior of the flip operation that I mentioned.

test/inductor/test_control_flow.py

torch/_higher_order_ops/associative_scan.py

…ative_scan_2

ydwu4

Looks good! Wait for ci

…ative_scan_2

Fixed lintrunner issues Added skip and expected fail decorators to flip tests

ydwu4 · 2024-08-30T15:59:40Z

@pytorchbot merge

pytorchmergebot · 2024-08-30T16:01:25Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

ydwu4 · 2024-08-30T16:02:27Z

@pytorchbot merge

pytorchmergebot · 2024-08-30T16:04:27Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jataylo · 2024-09-12T14:37:25Z

Hey @bohnstingl @ydwu4 @Chillee @zou3519 cc: @jeffdaily This change seems to have broken some ROCm tests, could you help us pinpoint what may be the issue here. Or some pointers on how we can debug this.

Hud link:
https://hud.pytorch.org/failure?name=rocm%20%2F%20linux-focal-rocm6.1-py3.8%20%2F%20test%20(default%2C%205%2C%206%2C%20linux.rocm.gpu.2)&jobName=undefined&failureCaptures=%5B%22functorch%2Ftest_control_flow.py%3A%3ATestControlFlow%3A%3Atest_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda%22%5D

Snippet:

    raise LoweringException(e, target, args, kwargs).with_traceback(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped
    out = decomp_fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan
    raise RuntimeError("Unable to generate code for associative_scan op")
torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op
  target: associative_scan
  args[0]: Subgraph(name='scan_combine_graph_0', graph_module=<lambda>(), graph=None)
  args[1]: [TensorBox(StorageBox(
    Pointwise(
      'cuda',
      torch.float32,
      def inner_fn(index):
          i0, i1, i2 = index
          tmp0 = ops.load(primals_1, 8 + i2 + -4 * i0 + 2 * i1)
          return tmp0
      ,
      ranges=[3, 2, 2],
      origin_node=rev,
      origins=OrderedSet([rev])
    )
  )), TensorBox(StorageBox(
    Pointwise(
      'cuda',
      torch.float32,
      def inner_fn(index):
          i0, i1, i2 = index
          tmp0 = ops.load(primals_2, 8 + i2 + -4 * i0 + 2 * i1)
          return tmp0
      ,
      ranges=[3, 2, 2],
      origin_node=rev_1,
      origins=OrderedSet([rev_1])
    )
  ))]
  args[2]: **0**

After running this locally I can see that we are only failing the pointwise combine_fn:

ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda 
ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda 
ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda

bohnstingl · 2024-09-12T16:50:04Z

Hi @jataylo,
Could you maybe provide the full error log?

On my local machine the test are running fine. However, on a different note, me and @ydwu4 are currently working on a related feature to the associative_scan, the scan operator. In that same PR we are updating those tests as well and can look at that.

ydwu4 · 2024-09-12T17:07:56Z

We probably can skip ROCM tests on the associative_scan tests. From the error log, ir.Scan.Create in the lowering logic returns None. Seems like a trition x rocm issue or something.

jataylo · 2024-09-13T12:38:07Z

Hey @bohnstingl , @ydwu4 full error log here https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445

I don't see this getting to any triton lowering before failing, seems like this is moreso a pytorch logic issue rather than a triton issue. We were passing scan UTs before this change too so would like to figure out what is going on.

EDIT: this also fails with the eager compile backend.

  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense
    raise NotImplementedError("associative_scan is not implemented for eager")
NotImplementedError: associative_scan is not implemented for eager

#133012 caused a regression on ROCm causing pointwise scan tests to fail ``` ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda ``` Skipping temporarily while triage is underway. Full log: https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445 ``` File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function out = lowerings[target](*args, **kwargs) # type: ignore[index] File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped out = decomp_fn(*args, **kwargs) File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan raise RuntimeError("Unable to generate code for associative_scan op") torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op ``` NOTE: even "eager" backend fails ``` File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense raise NotImplementedError("associative_scan is not implemented for eager") NotImplementedError: associative_scan is not implemented for eager ``` Pull Request resolved: #135995 Approved by: https://github.com/malfet

@ydwu4

This is part of a series of PRs to improve the functionality of the `associatve_scan` functionality. This specific PR introduces a `combine_mode`, which can be either `pointwise` (default) or `generic`. In case of `generic`, the `associative_scan` is more flexible and allows also to perform non-pointwise functions. This PR has been derived from pytorch#129307. @ydwu4 @Chillee @zou3519 Pull Request resolved: pytorch#133012 Approved by: https://github.com/ydwu4

…ch#135995) pytorch#133012 caused a regression on ROCm causing pointwise scan tests to fail ``` ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda ``` Skipping temporarily while triage is underway. Full log: https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445 ``` File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function out = lowerings[target](*args, **kwargs) # type: ignore[index] File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped out = decomp_fn(*args, **kwargs) File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan raise RuntimeError("Unable to generate code for associative_scan op") torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op ``` NOTE: even "eager" backend fails ``` File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense raise NotImplementedError("associative_scan is not implemented for eager") NotImplementedError: associative_scan is not implemented for eager ``` Pull Request resolved: pytorch#135995 Approved by: https://github.com/malfet

bohnstingl added 8 commits August 8, 2024 09:14

WIP: Implementation of combine_mode='generic'

4239303

Implemented reverse feature to associative_scan

41104c1

Merge branch 'generic_associative_scan_1' of github.com:bohnstingl/py…

c7681a2

…torch into generic_associative_scan_2

Updated combine_mode features and testcases

bfe5cc5

Cosmetic changes

1c49a80

Merge branch 'generic_associative_scan_1' of github.com:bohnstingl/py…

855aea3

…torch into generic_associative_scan_2

Minor fixes and created additional testcases for pytree

fcbf427

Removed vmap fixes

c56a901

pytorch-bot bot added the module: dynamo label Aug 8, 2024

pytorchbot added the open source label Aug 8, 2024

Added additional testcase for reverse feature.

e4723a2

zou3519 requested review from Chillee, ydwu4 and zou3519 August 9, 2024 12:31

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 9, 2024

Update testcases based on pytorch#133011

8c43630

pytorch-bot bot added the module: inductor label Aug 10, 2024

bohnstingl added 4 commits August 10, 2024 23:29

Updated lintrunner formatting

4219d39

Merge branch 'main' of github.com:pytorch/pytorch into generic_associ…

a3be9c6

…ative_scan_2

Minor corrections and fixed testcases

1229cfc

Merge branch 'main' of github.com:pytorch/pytorch into generic_associ…

1121166

…ative_scan_2

ydwu4 reviewed Aug 20, 2024

View reviewed changes

bohnstingl added 4 commits August 20, 2024 22:07

Updated testcases

c64741d

Merge branch 'main' of github.com:pytorch/pytorch into generic_associ…

6615b3c

…ative_scan_2

Added description of generic_scan algorithm

d8b3fcc

Merge branch 'main' of github.com:pytorch/pytorch into generic_associ…

1c2649b

…ative_scan_2

ydwu4 approved these changes Aug 23, 2024

View reviewed changes

Updated testcase

9705989

ydwu4 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 23, 2024

bohnstingl mentioned this pull request Aug 28, 2024

[feature request] torch.scan (also port lax.fori_loop / lax.while_loop / lax.associative_scan and hopefully parallelized associative scans) #50688

Open

bohnstingl added 3 commits August 28, 2024 15:18

Merge branch 'main' of github.com:pytorch/pytorch into generic_associ…

4b1e14b

…ative_scan_2

Removed errorneous vmap implementation of associative_scan

b9cb961

Fixed lintrunner issues Added skip and expected fail decorators to flip tests

Fixed remaining issues

6d1d698

bohnstingl requested a review from ydwu4 August 30, 2024 15:31

ydwu4 approved these changes Aug 30, 2024

View reviewed changes

pytorchmergebot added the merging label Aug 30, 2024

pytorchmergebot removed the merging label Aug 30, 2024

ydwu4 added the topic: not user facing topic category label Aug 30, 2024

pytorchmergebot added the merging label Aug 30, 2024

pytorchmergebot added the Merged label Aug 30, 2024

pytorchmergebot closed this in 9944380 Aug 30, 2024

pytorchmergebot removed the merging label Aug 30, 2024

bohnstingl mentioned this pull request Sep 2, 2024

Parallel Associative Scan #95408

Open

jataylo mentioned this pull request Sep 13, 2024

[ROCm] Skip pointwise associative scan tests due to regression #135995

Closed

bohnstingl mentioned this pull request Oct 7, 2024

Added host-side associative scan function #129307

Closed

Improvements for associative_scan - combine_mode #133012

Improvements for associative_scan - combine_mode #133012

Uh oh!

Conversation

bohnstingl commented Aug 8, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133012

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ydwu4 Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

bohnstingl Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

ydwu4 Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bohnstingl Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydwu4 Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

bohnstingl Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

jataylo Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

ydwu4 Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bohnstingl Aug 20, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ydwu4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydwu4 commented Aug 30, 2024

Uh oh!

pytorchmergebot commented Aug 30, 2024

Merge failed

Uh oh!

ydwu4 commented Aug 30, 2024

Uh oh!

pytorchmergebot commented Aug 30, 2024

Merge started

Uh oh!

jataylo commented Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bohnstingl commented Sep 12, 2024

Uh oh!

ydwu4 commented Sep 12, 2024

Uh oh!

jataylo commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

bohnstingl commented Aug 8, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 8, 2024 •

edited

Loading

ydwu4 Aug 20, 2024 •

edited

Loading

bohnstingl Aug 21, 2024 •

edited

Loading

ydwu4 Aug 20, 2024 •

edited

Loading

ydwu4 left a comment •

edited

Loading

jataylo commented Sep 12, 2024 •

edited

Loading

jataylo commented Sep 13, 2024 •

edited

Loading