Skip to content

Conversation

@wdziurdz
Copy link
Contributor

@wdziurdz wdziurdz commented Mar 6, 2025

Fixes #148661

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148663

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Pending, 1 Unrelated Failure

As of commit 9653c4b with merge base 79aa174 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@wdziurdz
Copy link
Contributor Author

wdziurdz commented Mar 6, 2025

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 6, 2025
@wdziurdz
Copy link
Contributor Author

wdziurdz commented Mar 6, 2025

@sraikund16 Please review the current changes. This commit fixes the issue that was introduced by the incorrect availabilities for HPU devices in this PR: #148182

@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 6, 2025
jeromean
jeromean previously approved these changes Mar 11, 2025
@HBN-MichalSzy
Copy link

@albanD can you review and merge? It's a small bug fix for HPU only that would be great if it went in still in PT2.7.

@wdziurdz
Copy link
Contributor Author

@Skylion007 Could you please review these changes? It's a small bug fix for HPU only.

Skylion007
Skylion007 previously approved these changes Mar 11, 2025
@Skylion007
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 11, 2025

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

EikanWang
EikanWang previously approved these changes Mar 12, 2025
@EikanWang EikanWang added ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end labels Mar 12, 2025
@EikanWang
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@wdziurdz
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

albanD
albanD previously approved these changes Mar 12, 2025
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactoring the code to be device agrostic would be best. But sure.

@ZainRizvi
Copy link
Contributor

@pytorchbot revert -c ghfirst -m "Sorry but this is breaking internally. @albanD, could you please help get this relanded? See D71052806 for more details. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts"

Errors look like:

Mismatched elements: 4 / 4 (100.0%)
Greatest absolute difference: nan at index (0,) (up to 1e-05 allowed)
Greatest relative difference: nan at index (0,) (up to 1.3e-06 allowed)

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Mar 12, 2025
This reverts commit 28b7880.

Reverted #148663 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @albanD, could you please help get this relanded? See D71052806 for more details. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](#148663 (comment)))
@pytorchmergebot
Copy link
Collaborator

@wdziurdz your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Mar 12, 2025
@pytorch-bot pytorch-bot bot dismissed stale reviews from jeromean, Skylion007, EikanWang, and albanD March 12, 2025 22:52

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure intenrnal failure is unrelated

@EikanWang
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@EikanWang
Copy link
Collaborator

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 5 checks: pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / build, pull / linux-focal-cuda12.6-py3.10-gcc11 / build, pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail / build, pull / linux-focal-cpu-py3.10-gcc11-bazel-test / build-and-test (default, 1, 1, linux.4xlarge), trunk / libtorch-linux-focal-cuda12.4-py3.10-gcc9-debug / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@wdziurdz
Copy link
Contributor Author

@pytorchbot cherry-pick --onto release/2.7 --fixes "Fix incorrect availability for HPU-only devices in the profiler" -c regression

@pytorchbot
Copy link
Collaborator

Cherry picking #148663

The cherry pick PR is at #149115 and it is linked with issue Fix incorrect availability for HPU-only devices in the profiler. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged open source Reverted topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Profiler][HPU] Incorrect availabilities for the HPU device

10 participants