Skip to content

Conversation

@nWEIdia
Copy link
Collaborator

@nWEIdia nWEIdia commented May 10, 2024

Fixes issues encountered in #121956

cc @ptrblck @Aidyn-A @atalman @malfet

@nWEIdia nWEIdia requested review from a team and jeffdaily as code owners May 10, 2024 18:43
@pytorch-bot
Copy link

pytorch-bot bot commented May 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125944

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 198cba4 with merge base aeb9934 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label May 10, 2024
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 11, 2024
@atalman
Copy link
Contributor

atalman commented May 13, 2024

@nWEIdia Two errors seems not to be related to this PR, however please rebase and rerun CI to be sure

@nWEIdia nWEIdia force-pushed the cuda_124_ci_docker_image branch from 75b0926 to 0135402 Compare May 14, 2024 02:28
@atalman
Copy link
Contributor

atalman commented May 14, 2024

@pytorchmergebot merge -f "One failure on distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD is not related"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorch-bot
Copy link

pytorch-bot bot commented May 14, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@nWEIdia
Copy link
Collaborator Author

nWEIdia commented May 14, 2024

@clee2000 Good catch! I now realize there might be UCC/UCX related regression that newer UCC/UCX may not be working as well with cuda 11.8.

@nWEIdia
Copy link
Collaborator Author

nWEIdia commented May 14, 2024

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@nWEIdia your PR has been successfully reverted.

@nWEIdia
Copy link
Collaborator Author

nWEIdia commented May 15, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 15, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants