-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Generalization of FSDP common for non-cuda execution #133209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133209
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 55cdd62 with merge base 76b044d ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
4b7e235 to
c12afd5
Compare
|
@wconstab who would have some bandwidth to do these FSDP update to make it work with non-cuda devices? |
|
@wconstab : Can you please help with the review and approval, Main changes here : https://github.com/pytorch/pytorch/pull/133209/files#diff-7b5c66f99161fa6a3d9042e80f8c8cc140a64e43445feede46f55e53154f6c3d (torch/testing/_internal/common_fsdp.py), others are mostly for adapting to change made in that file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, why the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, please help us unify code paths to reduce maintenance load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurneog this
wconstab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly LGTM. a couple of small questions and then i would stamp.
Also, fwiw you may want to look into FSDP2 support as well, if you haven't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo- is this actually unchanged in the new formula?
as long as TEST_CUDA := torch.cuda.is_available(), then DEVICE_COUNT := torch.cuda.device_count() so this appears equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a behavior change? were we relying on the 'set_device' for cuda to cover this before and now we make it explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ankurneog this
c12afd5 to
a8d9bee
Compare
|
@wconstab : thank you for your review comments, I have addressed them and also replied to your queries. could you please have a look and help with the approval. thank you. |
|
@wconstab : gentle reminder , could you kindly do the needful, thanks. |
|
Nice effort. |
a8d9bee to
d94ed2e
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
d94ed2e to
ab54f83
Compare
d05c785 to
ee5db28
Compare
Merge failedReason: 3 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
96817f5 to
7ec516d
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
7ec516d to
55cdd62
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| if TEST_HPU: | ||
| time.sleep(self.delay_before_reduction_ms / 1000) | ||
| elif TEST_CUDA: | ||
| torch.cuda._sleep(int(self.delay_after_loss_ms * get_cycles_per_ms())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is HPU sleeping based on delay_before_reduction_ms instead of delay_after_loss_ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one I just added is new
|
I am kind of confused why some of the comments left by previous reviewers were left unaddressed or without response. |
Could you please let me know what needs to be addressed? |
Motivation: Generalize unit tests so that can be executed for cuda and non cuda devices. Depedency : #133209 Merged now. There was a #135242 for these changes and closed due to in correct commits. I have incoroprated the changes as suggested in comments. @kwen2501 @zeshengzong Please review the changes. Pull Request resolved: #139184 Approved by: https://github.com/kwen2501 Co-authored-by: Yu, Guangye <[email protected]>
Motivation: Generalize unit tests so that can be executed for cuda and non cuda devices. Depedency : #133209 Merged now. There was a #135242 for these changes and closed due to in correct commits. I have incoroprated the changes as suggested in comments. @kwen2501 @zeshengzong Please review the changes. Pull Request resolved: #139184 Approved by: https://github.com/kwen2501 Co-authored-by: Yu, Guangye <[email protected]>
Motivation
The FSDP common code for FSDP UT execution is mostly written with cuda device in mind. However other devices such the intel Gaudi supports most of the functionality. We are generalizing the base content so that the UT content can be used for non-cuda device execution.
cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o