Skip to content

Conversation

@jataylo
Copy link
Collaborator

@jataylo jataylo commented Sep 13, 2024

#133012 caused a regression on ROCm causing pointwise scan tests to fail

ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda 
ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda 
ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda

Skipping temporarily while triage is underway.

Full log: https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445

  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped
    out = decomp_fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan
    raise RuntimeError("Unable to generate code for associative_scan op")
torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op

NOTE: even "eager" backend fails

  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense
    raise NotImplementedError("associative_scan is not implemented for eager")
NotImplementedError: associative_scan is not implemented for eager

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @hongxiayang

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135995

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit fa7b212 with merge base 7ed0563 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Sep 13, 2024
@jataylo jataylo added the ciflow/inductor-rocm Trigger "inductor" config CI on ROCm label Sep 13, 2024
@jithunnair-amd
Copy link
Collaborator

@malfet Another PR to skip unit tests and get ROCm CI signal to green while we investigate fix in parallel.

@jithunnair-amd
Copy link
Collaborator

@pytorchbot merge -f "Skipping these tests gets rocm workflow signal to green. Discussions ongoing on proper fix in parallel"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
…ch#135995)

pytorch#133012 caused a regression on ROCm causing pointwise scan tests to fail

```
ERROR: test_pointwise_associative_scan_tuple_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_tuple_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_complex_pytree_reverse_False_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_binary_operator_reverse_True_combine_mode_pointwise_cuda
ERROR: test_pointwise_associative_scan_binary_operator_reverse_False_combine_mode_pointwise_cuda
```

Skipping temporarily while triage is underway.

Full log: https://ossci-raw-job-status.s3.amazonaws.com/log/30067645445

```
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1020, in call_function
    out = lowerings[target](*args, **kwargs)  # type: ignore[index]
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 363, in wrapped
    out = decomp_fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/lowering.py", line 6245, in associative_scan
    raise RuntimeError("Unable to generate code for associative_scan op")
torch._inductor.exc.LoweringException: RuntimeError: Unable to generate code for associative_scan op
```

NOTE: even "eager" backend fails
```
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_higher_order_ops/associative_scan.py", line 338, in associative_scan_op_dense
    raise NotImplementedError("associative_scan is not implemented for eager")
NotImplementedError: associative_scan is not implemented for eager
```

Pull Request resolved: pytorch#135995
Approved by: https://github.com/malfet
@jithunnair-amd
Copy link
Collaborator

@pytorchbot cherry-pick --onto release/2.5 -c critical

@pytorchbot
Copy link
Collaborator

Cherry picking #135995

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x b9b6094793439a8e2132d545ba9760733b02bc7c returned non-zero exit code 1

Auto-merging test/functorch/test_control_flow.py
CONFLICT (content): Merge conflict in test/functorch/test_control_flow.py
error: could not apply b9b6094793... [ROCm] Skip pointwise associative scan tests due to regression (#135995)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

@jithunnair-amd
Copy link
Collaborator

The cherry-pick PR is at #136557

atalman pushed a commit that referenced this pull request Sep 25, 2024
* [ROCm] skip test_fp8_cast_and_t on non-MI300 machines (#135917)

Fixes #ISSUE_NUMBER

Pull Request resolved: #135917
Approved by: https://github.com/malfet

(cherry picked from commit 6cdc70b)

* Skip pointwise associative scan tests due to regression (changes based on PR #135995)

* Cherry-pick fix from #135702

---------

Co-authored-by: Prachi Gupta <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm Trigger "default" config CI on ROCm Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants