-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Update test_c10d_object_collectives.py with DistributedTestBase class #145056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145056
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 6a29ea1 with merge base 9c78fb9 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also please properly type annotation this function with callable[P,R] etc? Decorators are important to type properly as they lose all typing info of things they decorate otherwise.
| def with_comms(func=None): | |
| def with_comms(func=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| devices=["cpu","cuda","hpu"] | |
| devices=("cpu","cuda","hpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if the way DistributedTestBase passes device arg to test classes plays nicely with using @parametrize on test instances- have you tried this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried using both @parameterize and DistributedTestCase in the same file on different test cases (test/distributed/test_functional_api.py) but I haven't tried using them both on the same testcase so I am unsure about how they interact.
Thanks for pointing this out! I will take a look at it and see if any changes are required :)
|
@kwen2501 : Kindly help with the review. thanks |
|
Sorry just back to work this week. Will have a look soon. |
kwen2501
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We are very close to land, once the comments are addressed.
|
@kwen2501 : I had addressed your review comments ? Could you take a look? |
|
@ankurneog , @kwen2501 , @fegin : Could you help with the approval and merging ? |
|
@ankurneog , @kwen2501 : Could you help with the merging ? |
|
@pytorchbot merge -i |
|
|
|
@guangyey : Could you help with retriggering the CI and merge ? |
@kwen2501 I don't seem to have the permission to directly assign reviewers. Do i need to do anything from my side to enable that ? |
|
@pytorchbot merge |
|
@pytorchbot rebase |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Fix typo to resolve CI issues.
Fix typo to resolve CI issues. Fix issues as per review comments.
|
Successfully rebased |
287a9e6 to
6a29ea1
Compare
Merge failedReason: New commits were pushed while merging. Please rerun the merge command. Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
|
@kwen2501 , @guangyey , @ankurneog : Could you help with the merging ? The failure seems to be unrelated. |
|
"Try to land this since the failure is unrelated." |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…pytorch#145056) # MOTIVATION To generalize distributed test cases for non-CUDA devices, we are leveraging the DistributedTestBase class introduced in [PR pytorch#138216](pytorch#138216). This new class is derived from MultiProcessTestCase and abstracts the creation/deletion of process groups and other functionality for specific devices. In this PR, we extend the scope of these tests to support HPUs. # CHANGES Replaced MultiProcessTestCase with the DistributedTestBase class. Extended test functionality to include support for HPUs. Utilized instantiate_device_type_tests with targeted attributes to generate device-specific test instances. Applied the skipIfHPU decorator to skip tests that are not yet compatible with HPU devices. Pull Request resolved: pytorch#145056 Approved by: https://github.com/kwen2501, https://github.com/guangyey
MOTIVATION
To generalize distributed test cases for non-CUDA devices, we are leveraging the DistributedTestBase class introduced in PR #138216. This new class is derived from MultiProcessTestCase and abstracts the creation/deletion of process groups and other functionality for specific devices. In this PR, we extend the scope of these tests to support HPUs.
CHANGES
Replaced MultiProcessTestCase with the DistributedTestBase class.
Extended test functionality to include support for HPUs.
Utilized instantiate_device_type_tests with targeted attributes to generate device-specific test instances.
Applied the skipIfHPU decorator to skip tests that are not yet compatible with HPU devices.
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @ankurneog