-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Tests Generelization for multiple accelerator devices #139184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139184
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f033ba2 with merge base a97c6a7 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Adding @weifengpy and @awgu to review since most changes are around FSDP tests. |
|
Thanks for your big effort! General comment: Can we have a in test code? |
hi @kwen2501 , Thanks for your comments.
|
jgong5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- device id accepted by both(cuda & hpu) device is different. for HPU it should be "hpu:0" for all ranks and for cuda it should be rank numbers(self.rank/cuda.current_device()).
- So for HPU it is fixed and can be defined globally but not for cuda. So "if TEST_HPU" statement is needed but can be reduced checking again and again.
- I have further updated the files please review
Is it possible to factor out some util functions to hide these TEST_HPU device checks and to be called by the UTs instead of introducing these device checks directly into the UTs?
Thanks @jgong5 for your comment. Can we address this as part of a different PR. Some background : What we need is platform independent APIs where we don't have references to device specific APIs. We plan to do some abstraction as part of a next PR, where we would have to clean up the files to be platform independent, removing dependencies such as torch.cuda.device_count(), with generic APIs for eg torch.device.count(), for that we would have to make some changes in the pytorch python frontend , that would need some effort and prototyping. But it would make it truely platform independent and all these checks can be under the wrappers. Please share your views. |
Hi @jgong5 , agree with @ankurneog . We will adreess this feature enhancement. |
@pytorchbot rebase |
|
You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
42ede36 to
75bcddd
Compare
Thanks for the explanation. I am still a bit confused. Does the following rule work for HPU or not? |
|
Does the following code work for HPU or not? |
Hi @kwen2501 torch.get_device_module("hpu") doesn't work for HPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @kwen2501 mentioned in one of the comments torch.get_device_module(device.type).reset_peak_memory_stats() should work. We just need to check once if there is any issue with Out-of-tree device like HPU. I just checked it seems to be working :
>>> device=torch.device("hpu:0")
>>> torch.get_device_module(device.type).is_available()
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change. It’s a good idea to use torch.get_device_module(...) instead of relying on if...else statements with device-specific namespaces like torch.cuda, torch.hpu, or torch.xpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets keep it as
devices = ["cuda"]
if TEST_HPU:
devices.append("hpu")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous comment
devices = ["cuda"]
if TEST_HPU:
devices.append("hpu")
That way new devices can just add on to the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as the commented before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device_id is already hpu:0 or cuda:0, it think
Sequential(FSDP(Linear(5,5)).to(device)
should work isn't it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add this directly to the base class FSDPTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the and use self.device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same .to(self.device) should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets update device in FSDPTest base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| device_type = "hpu:0" if TEST_HPU else torch.cuda.current_device() | |
| device_type = "hpu:0" if TEST_HPU else torch.get_device_module().current_device() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| device_type = "hpu:0" if TEST_HPU else torch.cuda.current_device() | |
| device_type = "hpu:0" if TEST_HPU else torch.get_device_module().current_device() |
hi @kwen2501 , removed almost all if else and TEST_HPU & TEST_CUDA statements. |
hi @kwen2501 please review. |
hi @kwen2501 CI ran fine, please approve after review. |
|
ke is on PTO, @awgu could you take a look at this? |
|
I found test_root_module_is_not_FSDP_cuda logs on the main branch, so I think its something to do with test instantiation with instantiate_device_type_tests and internal test collection collecting the noninstantiated tests |
|
@clee2000 can you please re merge now. This class only instantiate tests based on the device type. |
|
@pytorchmergebot merge |
|
This PR needs to be approved by an authorized maintainer before merge. |
@clee2000 can you please re merge. |
|
@awgu has suggested @fegin as a POC for internal testing @fegin I believe a change like D67285322 will help with some of these tests. That being said, I do not know if we have multigpu machines for testing internally so I don't know if there is any value in running them I will defer to Andrew and Chien Chin on what to do next, as this is the extent of my knowledge on distributed testing |
| torch.cuda.reset_peak_memory_stats() | ||
| ).to(device_type.type) | ||
| x = torch.randn(10000, 256, requires_grad=True).to(device_type.type) | ||
| torch.get_device_module(device_type.type).reset_peak_memory_stats() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you remove @unittest.skipIf(not torch.cuda.is_available(), "Test requires CUDA"), this line should fail when running on only CPU devices. I got the module 'torch.cpu' has no attribute 'reset_peak_memory_stats' error with the internal CPU tests. I'm not sure why CI didn't fail though. cc., @clee2000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is disabled in CI due to #79510
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So we should also disable this internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahulsingh-intel Can you add an unconditional skip to this test? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @fegin Added.
Thanks.
|
We can retry this PR after the internal fix is landed. @clee2000 |
hi @fegin , can you please approve this now because its taking lot of effort in case of conflicts and we have a depdency ( #139749 ) also on this PR. Once you approve I will make the common changes in remaining fsdp test modules in another PR. |
|
@kwen2501 : could you please help with the approval of this PR. |
kwen2501
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving.
Sorry I went on PTO since mid Dec, and I don't know why my previous approval were gone (perhaps a re-request of review nullified it).
|
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Thank you @kwen2501 |
…#149848) The motivation for this PR is refactor existing test cases in the folder test/distributed/_composable/fsdp/ or fsdp2(as referred to in torch titan) to be device agnostic such that any accelerator type is supported (for eg. CUDA, HPU, XPU etc) The changes are in line with previously merged changes for fsdp (present in the folder test/distributed/fsdp/ ) test cases: #139184 Pull Request resolved: #149848 Approved by: https://github.com/kwen2501, https://github.com/guangyey

Motivation: Generalize unit tests so that can be executed for cuda and non cuda devices.
Depedency : #133209 Merged now.
There was a #135242 for these changes and closed due to in correct commits. I have incoroprated the changes as suggested in comments.
@kwen2501 @zeshengzong Please review the changes.
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o