[CI] Fix XPU CI failure #138548

guangyey · 2024-10-22T05:14:46Z

Stack from ghstack (oldest at bottom):

-> [CI] Fix XPU CI failure #138548

Motivation

Fix #138577.

Solution

All UTs in test/inductor/test_compiled_optimizers.py are fixed by Support more foreach ops for tensor beta support #134170
UT in test/inductor/test_pattern_matcher.py is introduced by [inductor] Preserve metadata across replace_by_example and register_replacement patterns #138089, we will skip this UT due to the unsupported feature max_autotune_gemm_backends:Triton.
We have a new impl related to histc, so we remove the expected failure from test/inductor/test_torchinductor_opinfo.py
We support avg_pool3d for fp16 data type, so we remove the expected failure from test/inductor/test_torchinductor_opinfo.py
CUDA-bias code is introduced by Bugfix for passing None args to user defined Triton kernel #138472, we just generalize it to GPU_TYPE.

Additional Context

Why update torch-xpu-ops commit pin here?

We have to update commit pin to avoid the build failure raised by the code change C10_UNUSED.

What does the feature of torch-xpu-ops update?

Add some foreach ops, like unary ops and foreach_clamp_max etc;
Add some maxpool ops forward and backward, like averge_pool3d and max_pool3d
Add some other ops, like log_normal_, index_copy, and mode etc;
fix build failure related to C10_UNUSED;

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-10-22T05:14:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138548

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5d4a4d1 with merge base 6b29d40 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey · 2024-10-22T10:01:31Z

test/inductor/test_torchinductor_opinfo.py

    "nn.functional.conv_transpose3d": {f32, f64},
    # rrelu not supported on XPU now
    "nn.functional.rrelu": {f16, f32, f64},
-    "histc": {i32, i64},


This is fixed in torch-xpu-ops. We have a histc impl now.

guangyey · 2024-10-22T10:02:31Z

test/inductor/test_compiled_optimizers.py

    "test_sgd_tensor_lr_cpu": 2,
    "test_sgd_tensor_lr_cuda": 2,
    "test_sgd_tensor_lr_xpu": 2,
-    "test_sgd_tensor_lr_foreach_xpu": 2,


All UTs in this file are fixed by #134170

guangyey · 2024-10-22T10:04:20Z

test/inductor/test_pattern_matcher.py

                # of search_fn).
                self.assertTrue(pattern.pattern_eq(search_fn_pattern))

+    @skipIfXpu


The feature max_autotune_gemm_backends: Triton is not supported on XPU.

[ghstack-poisoned]

ghstack-source-id: 62c857f Pull Request resolved: #138548

guangyey · 2024-10-22T14:31:11Z

test/inductor/test_torchinductor_opinfo.py

    # not implemented for 'Half'
    "nn.functional.multilabel_margin_loss": {f16},
    "nn.functional.multi_margin_loss": {f16},
-    "nn.functional.avg_pool3d": {f16},


fp16 data type is supported on avg_pool3d for xpu ops.

[ghstack-poisoned]

pytorchmergebot · 2024-10-23T20:20:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-24T01:31:17Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

guangyey · 2024-10-24T03:08:26Z

@malfet Update torch-xpu-ops pin commit from a developing branch to a master branch—no more other code changes. Please help review again.

malfet

Nice hashsum!

malfet · 2024-10-24T03:15:23Z

@pytorchbot merge

pytorchmergebot · 2024-10-24T03:16:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-24T03:17:13Z

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / win-vs2022-xpu-py3 / build

Details for Dev Infra team

Raised by workflow job

ghstack-source-id: 7c97def Pull Request resolved: #138548

guangyey · 2024-10-24T04:12:43Z

@malfet Due to the newly updated pin commit introducing a sort of new op implementation, the expected failure UTs in the Inductor will be fixed. To avoid introducing new code changes in this PR, we would change back to the previously updated pin commit and let this PR land first. Additionally, we will file another PR to remove these expected failures with a new torch-xpu-ops pin commit.

guangyey · 2024-10-24T04:15:12Z

@pytorchbot merge

pytorchmergebot · 2024-10-24T04:16:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

guangyey · 2024-10-24T07:54:35Z

@pytorchbot merge -f "macos CI is always queued and this PR is unrelated to macos"

pytorchmergebot · 2024-10-24T07:54:54Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-10-24T07:56:15Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[ghstack-poisoned]

guangyey requested review from EikanWang and gujinghui as code owners October 22, 2024 05:14

pytorch-bot bot added the topic: not user facing topic category label Oct 22, 2024

guangyey added the ciflow/xpu Run XPU CI tasks label Oct 22, 2024

pytorchbot added the open source label Oct 22, 2024

pytorch-bot bot added the module: inductor label Oct 22, 2024

guangyey changed the title ~~Update torch-xpu-ops pin commit~~ Fix XPU CI failure Oct 22, 2024

guangyey commented Oct 22, 2024

View reviewed changes

guangyey changed the title ~~Fix XPU CI failure~~ [CI] Fix XPU CI failure Oct 22, 2024

guangyey added the ciflow/inductor label Oct 22, 2024

Update

1cccf21

[ghstack-poisoned]

guangyey added a commit that referenced this pull request Oct 22, 2024

Update torch-xpu-ops pin commit

c9784e1

ghstack-source-id: 62c857f Pull Request resolved: #138548

guangyey commented Oct 22, 2024

View reviewed changes

guangyey added 5 commits October 22, 2024 16:42

Update

b93cd8b

[ghstack-poisoned]

Update

fe643e0

[ghstack-poisoned]

Update

cdb3481

[ghstack-poisoned]

Update

1b71d75

[ghstack-poisoned]

Update

b7b9f0a

[ghstack-poisoned]

Update

00c0fbf

[ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 23, 2024

pytorchmergebot added the merging label Oct 23, 2024

pytorchmergebot removed the merging label Oct 24, 2024

EikanWang approved these changes Oct 24, 2024

View reviewed changes

guangyey requested a review from malfet October 24, 2024 03:08

malfet approved these changes Oct 24, 2024

View reviewed changes

pytorchmergebot added the merging label Oct 24, 2024

pytorchmergebot removed the merging label Oct 24, 2024

guangyey added a commit that referenced this pull request Oct 24, 2024

Update torch-xpu-ops pin commit

3c7f5c4

ghstack-source-id: 7c97def Pull Request resolved: #138548

pytorchmergebot added the merging label Oct 24, 2024

pytorchmergebot added the Merged label Oct 24, 2024

pytorchmergebot closed this in 0efa590 Oct 24, 2024

pytorchmergebot removed the merging label Oct 24, 2024

guangyey added 2 commits October 24, 2024 09:09

Update

8ad9141

[ghstack-poisoned]

Update

5d4a4d1

[ghstack-poisoned]

github-actions bot deleted the gh/guangyey/80/head branch November 25, 2024 02:10

[CI] Fix XPU CI failure #138548

[CI] Fix XPU CI failure #138548

Uh oh!

Conversation

guangyey commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Additional Context

Uh oh!

pytorch-bot bot commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138548

✅ No Failures

Uh oh!

guangyey Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangyey Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Oct 23, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge failed

Uh oh!

guangyey commented Oct 24, 2024

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

malfet commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge failed

Uh oh!

guangyey commented Oct 24, 2024

Uh oh!

guangyey commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge started

Uh oh!

guangyey commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

guangyey commented Oct 22, 2024 •

edited

Loading

pytorch-bot bot commented Oct 22, 2024 •

edited

Loading

guangyey Oct 22, 2024 •

edited

Loading