-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Adapt Dynamo tests to HPUs using instantiate_device_type_tests #144387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144387
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit 025617a with merge base d95a6ba ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
377ed25 to
91fed86
Compare
|
@yanboliang : Did you get a chance to review this one ? |
|
Can you fix these failed unit tests? |
|
@yanboliang : Have fixed the unit test issues. |
|
@yanboliang : The 2 new failures in CI are not related to the changes in this PR. |
|
@pytorchbot rebase |
|
"Try to land this since the failure is unrelated." |
|
This PR needs to be approved by an authorized maintainer before merge. |
|
@EikanWang : can you please help with the review and approval. Thanks |
yanboliang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please fix the failed test.
@yanboliang : looks like the failing tests are not related to the changes we made. |
|
@amathewc This test should only running on pytorch/test/dynamo/test_repros.py Lines 4389 to 4393 in 00ffeca
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like only the following test cases are applied to hpu. Regarding the other test cases of ReproTestsDevice, they still require CUDA. I question why we must move other test cases from ReproTests to ReproTestsDevice.
- test_sub_alpha_scalar_repro
- test_guard_default_device
- test_megablocks_moe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EikanWang : The idea was to keep all device tests ( whether the device is cuda, hpu , xpu or any other ) within the ReproTests"Device" class . It felt more logical to do so and also in future if these tests are enabled on any of the devices, they can easily be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea makes sense. However, I cannot capture the rule that which test case should be moved to ReproTestsDevice. What's the logic that moves these particular test cases to ReproTestsDevice from ReproTests, even though some test cases are still decorated by require_cuda?
- If this PR intends to generalize the
ReproTests, the current status should be intermediate. - If this PR intends to enable HPU through
instantiate_device_type_tests, this PR should not move cuda-specific cases toReproTestsDevice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is to keep CPU only tests in ReproTests and device based tests which are instantiated using instantiate_device_type_tests in ReproTestsDevice.
Right now, the device tuple has only ("cuda" and "hpu") . And there are some specific test cases which work only on CUDA devices today - hence they have been marked as require_cuda. This approach provides us various flexibilities :
- if we have to add some test cases in future , which work only on hpu, they can be added to ReproTestsDevice class later with a similar require_hpu tag.
- If any of the test cases which have been marked as requires_cuda is supported on HPU, the decorator can be removed.
- if another device is added, the test cases can be enabled easily by adding to the devices tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the elaboration. LGTM.
ankurneog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@EikanWang , @kwen2501 : Could you review this PR? |
|
@pytorchmergebot : merge |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchmergebot merge |
Merge failedReason: Approvers from one of the following sets are needed:
|
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Fix unit test failures.
Fix unit test failures. Remove TEST_HPU flag
Fix unit test failures. Remove TEST_HPU flag Fix test_debug_utils.py
Fix unit test failures. Remove TEST_HPU flag Fix test_debug_utils.py Fix issue in test_repros.py which was causing test_flash_attn_backward_mixed_strides to fail
Fix unit test failures. Remove TEST_HPU flag. Fix test_debug_utils.py. Fix issue in test_repros.py which was causing test_flash_attn_backward_mixed_strides to fail. Remove skipIfHpu decorator for tests which already have requires_cuda decorator as per review comments.
|
Successfully rebased |
2806998 to
025617a
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR is related to #145476 . That PR had two files (test_functions.py and test_misc.py) . test_functions was causing CI/rebase/merge issues and hence removed for now. This PR contains only test_misc.py. This is a continuation of #144387 . ## MOTIVATION We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at #126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices. Other accelerators can also extend the functionality by adding the device in the devices list. ( For eg: xpu ) ## CHANGES Create a separate class for test functions running on CUDA devices Extend the functionality of these tests to include HPUs Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances within the new classes Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices PS: Most of these changes were initially part of #147609 , but closed that PR due to merge conflicts. The review comments were handled in this PR. Pull Request resolved: #149499 Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/cyyever
This PR is related to pytorch#145476 . That PR had two files (test_functions.py and test_misc.py) . test_functions was causing CI/rebase/merge issues and hence removed for now. This PR contains only test_misc.py. This is a continuation of pytorch#144387 . ## MOTIVATION We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at pytorch#126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices. Other accelerators can also extend the functionality by adding the device in the devices list. ( For eg: xpu ) ## CHANGES Create a separate class for test functions running on CUDA devices Extend the functionality of these tests to include HPUs Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances within the new classes Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices PS: Most of these changes were initially part of pytorch#147609 , but closed that PR due to merge conflicts. The review comments were handled in this PR. Pull Request resolved: pytorch#149499 Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/cyyever
This PR is related to pytorch#145476 . That PR had two files (test_functions.py and test_misc.py) . test_functions was causing CI/rebase/merge issues and hence removed for now. This PR contains only test_misc.py. This is a continuation of pytorch#144387 . ## MOTIVATION We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at pytorch#126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices. Other accelerators can also extend the functionality by adding the device in the devices list. ( For eg: xpu ) ## CHANGES Create a separate class for test functions running on CUDA devices Extend the functionality of these tests to include HPUs Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances within the new classes Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices PS: Most of these changes were initially part of pytorch#147609 , but closed that PR due to merge conflicts. The review comments were handled in this PR. Pull Request resolved: pytorch#149499 Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/cyyever
MOTIVATION
We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at #126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.
Other accelerators can also extend the functionality by adding the device in the devices list. ( For eg: xpu )
CHANGES
Create a separate class for test functions running on CUDA devices
Extend the functionality of these tests to include HPUs
Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances within the new classes
Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices
Previously we had submitted some changes in #140131 . However, deleted that PR due to merge conflicts and other issues.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @ankurneog