-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[CI] Fix XPU CI failure #138548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Fix XPU CI failure #138548
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138548
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5d4a4d1 with merge base 6b29d40 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| "nn.functional.conv_transpose3d": {f32, f64}, | ||
| # rrelu not supported on XPU now | ||
| "nn.functional.rrelu": {f16, f32, f64}, | ||
| "histc": {i32, i64}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed in torch-xpu-ops. We have a histc impl now.
| "test_sgd_tensor_lr_cpu": 2, | ||
| "test_sgd_tensor_lr_cuda": 2, | ||
| "test_sgd_tensor_lr_xpu": 2, | ||
| "test_sgd_tensor_lr_foreach_xpu": 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All UTs in this file are fixed by #134170
| # of search_fn). | ||
| self.assertTrue(pattern.pattern_eq(search_fn_pattern)) | ||
|
|
||
| @skipIfXpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature max_autotune_gemm_backends: Triton is not supported on XPU.
| # not implemented for 'Half' | ||
| "nn.functional.multilabel_margin_loss": {f16}, | ||
| "nn.functional.multi_margin_loss": {f16}, | ||
| "nn.functional.avg_pool3d": {f16}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp16 data type is supported on avg_pool3d for xpu ops.
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: New commits were pushed while merging. Please rerun the merge command. Details for Dev Infra teamRaised by workflow job |
|
@malfet Update torch-xpu-ops pin commit from a developing branch to a master branch—no more other code changes. Please help review again. |
malfet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice hashsum!
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: xpu / win-vs2022-xpu-py3 / build Details for Dev Infra teamRaised by workflow job |
|
@malfet Due to the newly updated pin commit introducing a sort of new op implementation, the expected failure UTs in the Inductor will be fixed. To avoid introducing new code changes in this PR, we would change back to the previously updated pin commit and let this PR land first. Additionally, we will file another PR to remove these expected failures with a new torch-xpu-ops pin commit. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot merge -f "macos CI is always queued and this PR is unrelated to macos" |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Motivation
Fix #138577.
Solution
test/inductor/test_compiled_optimizers.pyare fixed by Support more foreach ops for tensor beta support #134170test/inductor/test_pattern_matcher.pyis introduced by [inductor] Preserve metadata across replace_by_example and register_replacement patterns #138089, we will skip this UT due to the unsupported featuremax_autotune_gemm_backends:Triton.histc, so we remove the expected failure fromtest/inductor/test_torchinductor_opinfo.pyavg_pool3dforfp16data type, so we remove the expected failure fromtest/inductor/test_torchinductor_opinfo.pyGPU_TYPE.Additional Context
We have to update commit pin to avoid the build failure raised by the code change C10_UNUSED.
unary opsandforeach_clamp_maxetc;averge_pool3dandmax_pool3dlog_normal_,index_copy, andmodeetc;C10_UNUSED;cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov