Skip to content

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented Oct 22, 2024

Stack from ghstack (oldest at bottom):

Motivation

Fix #138577.

Solution

  1. All UTs in test/inductor/test_compiled_optimizers.py are fixed by Support more foreach ops for tensor beta support #134170
  2. UT in test/inductor/test_pattern_matcher.py is introduced by [inductor] Preserve metadata across replace_by_example and register_replacement patterns #138089, we will skip this UT due to the unsupported feature max_autotune_gemm_backends:Triton.
  3. We have a new impl related to histc, so we remove the expected failure from test/inductor/test_torchinductor_opinfo.py
  4. We support avg_pool3d for fp16 data type, so we remove the expected failure from test/inductor/test_torchinductor_opinfo.py
  5. CUDA-bias code is introduced by Bugfix for passing None args to user defined Triton kernel #138472, we just generalize it to GPU_TYPE.

Additional Context

Why update torch-xpu-ops commit pin here?

We have to update commit pin to avoid the build failure raised by the code change C10_UNUSED.

What does the feature of torch-xpu-ops update?

  1. Add some foreach ops, like unary ops and foreach_clamp_max etc;
  2. Add some maxpool ops forward and backward, like averge_pool3d and max_pool3d
  3. Add some other ops, like log_normal_, index_copy, and mode etc;
  4. fix build failure related to C10_UNUSED;

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138548

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5d4a4d1 with merge base 6b29d40 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 22, 2024
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Oct 22, 2024
@guangyey guangyey changed the title Update torch-xpu-ops pin commit Fix XPU CI failure Oct 22, 2024
"nn.functional.conv_transpose3d": {f32, f64},
# rrelu not supported on XPU now
"nn.functional.rrelu": {f16, f32, f64},
"histc": {i32, i64},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed in torch-xpu-ops. We have a histc impl now.

"test_sgd_tensor_lr_cpu": 2,
"test_sgd_tensor_lr_cuda": 2,
"test_sgd_tensor_lr_xpu": 2,
"test_sgd_tensor_lr_foreach_xpu": 2,
Copy link
Collaborator Author

@guangyey guangyey Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All UTs in this file are fixed by #134170

# of search_fn).
self.assertTrue(pattern.pattern_eq(search_fn_pattern))

@skipIfXpu
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature max_autotune_gemm_backends: Triton is not supported on XPU.

@guangyey guangyey changed the title Fix XPU CI failure [CI] Fix XPU CI failure Oct 22, 2024
[ghstack-poisoned]
guangyey added a commit that referenced this pull request Oct 22, 2024
ghstack-source-id: 62c857f
Pull Request resolved: #138548
# not implemented for 'Half'
"nn.functional.multilabel_margin_loss": {f16},
"nn.functional.multi_margin_loss": {f16},
"nn.functional.avg_pool3d": {f16},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fp16 data type is supported on avg_pool3d for xpu ops.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 23, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team Raised by workflow job

@guangyey
Copy link
Collaborator Author

@malfet Update torch-xpu-ops pin commit from a developing branch to a master branch—no more other code changes. Please help review again.

@guangyey guangyey requested a review from malfet October 24, 2024 03:08
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice hashsum!

@malfet
Copy link
Contributor

malfet commented Oct 24, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / win-vs2022-xpu-py3 / build

Details for Dev Infra team Raised by workflow job

guangyey added a commit that referenced this pull request Oct 24, 2024
ghstack-source-id: 7c97def
Pull Request resolved: #138548
@guangyey
Copy link
Collaborator Author

@malfet Due to the newly updated pin commit introducing a sort of new op implementation, the expected failure UTs in the Inductor will be fixed. To avoid introducing new code changes in this PR, we would change back to the previously updated pin commit and let this PR land first. Additionally, we will file another PR to remove these expected failures with a new torch-xpu-ops pin commit.

@guangyey
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@guangyey
Copy link
Collaborator Author

@pytorchbot merge -f "macos CI is always queued and this PR is unrelated to macos"

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

[ghstack-poisoned]
[ghstack-poisoned]
@github-actions github-actions bot deleted the gh/guangyey/80/head branch November 25, 2024 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants