Skip to content

Conversation

@zdevito
Copy link
Contributor

@zdevito zdevito commented Mar 23, 2024

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122539

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 72e98fd with merge base 29132c2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Mar 23, 2024
zdevito added a commit that referenced this pull request Mar 23, 2024
This reverts commit bf18e96.

ghstack-source-id: d75d108
Pull Request resolved: #122539
@zdevito zdevito requested a review from wconstab March 23, 2024 01:03
@zdevito zdevito requested a review from shuqiangzhang March 26, 2024 22:07
compute_duration defaults to true since retire_id is only called in the
watchdog thread, which is currently a place we call cuda APIs which may hang,
but care should be taken to avoid computing duration in any function that must
never hang. (timing must also be enabled for compute_duration - see
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer correct as of this PR landing, but if you're fixing it in a later PR I'm happy to land this as a pure revert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok looks like you aren't updating this later. better just update it here probably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe it is still correct, given the CI results of PRs later in the stack. I am going to land this as is and sort out what is happening in the CI for the later PRs.

@zdevito
Copy link
Contributor Author

zdevito commented Mar 27, 2024

@pytorchbot land

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 27, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'land' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@zdevito
Copy link
Contributor Author

zdevito commented Mar 27, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 27, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@zdevito zdevito added the topic: not user facing topic category label Mar 27, 2024
sanketpurandare pushed a commit to sanketpurandare/pytorch that referenced this pull request Apr 22, 2024
…ytorch#122539)

This reverts commit bf18e96.

It is stacked after a fix to elapsed_time that will resolve the memory issues that required in the introduction of this flag.

Pull Request resolved: pytorch#122539
Approved by: https://github.com/wconstab, https://github.com/shuqiangzhang
ghstack dependencies: pytorch#122538
@github-actions github-actions bot deleted the gh/zdevito/258/head branch April 27, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants