Skip to content

Conversation

@ankurneog
Copy link

@ankurneog ankurneog commented May 23, 2024

Motivation

Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations.
Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming.
Hence with this PR we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded.
The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices

Changes

Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU.
Include code to check if intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests

Additional Context

please refer the following RFC : pytorch/rfcs#63

@ankurneog ankurneog requested review from a team and mruberry as code owners May 23, 2024 09:38
@pytorch-bot
Copy link

pytorch-bot bot commented May 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126970

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ca8b215 with merge base 648625b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 23, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: ankurneog / name: Ankur Neog (ca8b215)

@ankurneog ankurneog marked this pull request as draft May 23, 2024 09:38
@ankurneog ankurneog force-pushed the ankurneog_hpu_device_ut_support branch from 67cd3e6 to e9c5a5f Compare May 24, 2024 03:28
@ankurneog ankurneog marked this pull request as ready for review May 24, 2024 03:30
@ankurneog
Copy link
Author

@mruberry , @guangyey : could you please help with the review and merge. Thank you.

@guangyey guangyey requested a review from albanD May 24, 2024 07:25
@ankurneog ankurneog force-pushed the ankurneog_hpu_device_ut_support branch from e9c5a5f to cd86ee0 Compare May 24, 2024 08:25
@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 24, 2024
@ankurneog
Copy link
Author

ankurneog commented Jun 4, 2024

@albanD /@colesbury : can you please help with the review and merge, thank you.

@ankurneog
Copy link
Author

@pytorchbot rebase

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 11, 2024

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

@ankurneog ankurneog force-pushed the ankurneog_hpu_device_ut_support branch from cd86ee0 to ca8b215 Compare June 11, 2024 10:31
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho sorry this dropped from my queue.
Of course, since we don't have CI signal for this, we won't be able to guarantee that this won't get broken but feel free to send follow ups as needed.

@albanD
Copy link
Collaborator

albanD commented Jun 11, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 11, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@ankurneog
Copy link
Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 11, 2024
@ankurneog
Copy link
Author

@albanD : Thank you , there was some label missing (user facing) and hence the merge failed, I have added that , could you please trigger a merge again.

@ankurneog
Copy link
Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ankurneog ankurneog deleted the ankurneog_hpu_device_ut_support branch June 13, 2024 01:30
TharinduRusira pushed a commit to TharinduRusira/pytorch that referenced this pull request Jun 14, 2024
…sts (pytorch#126970)

### Motivation
Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations.
Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming.
Hence with this PR  we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded.
The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices

### Changes
Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU.
Include code to check if  intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests

### Additional Context
please refer the following RFC : pytorch/rfcs#63

Pull Request resolved: pytorch#126970
Approved by: https://github.com/albanD
ignaciobartol pushed a commit to ignaciobartol/pytorch that referenced this pull request Jun 14, 2024
…sts (pytorch#126970)

### Motivation
Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations.
Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming.
Hence with this PR  we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded.
The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices

### Changes
Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU.
Include code to check if  intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests

### Additional Context
please refer the following RFC : pytorch/rfcs#63

Pull Request resolved: pytorch#126970
Approved by: https://github.com/albanD
pytorchmergebot pushed a commit that referenced this pull request Jul 20, 2024
## Motivation
This is follow up to PR:#126970  to support Gaudi devices for Pytorch UT execution.

## Changes
We are adding additional hooks to:
1. Add dtype exceptions for Gaudi/HPU
2. Extend onlyNativeDevices decorator  functionality to add additional devices

Pull Request resolved: #128584
Approved by: https://github.com/albanD
DiweiSun pushed a commit to DiweiSun/pytorch that referenced this pull request Jul 22, 2024
## Motivation
This is follow up to PR:pytorch#126970  to support Gaudi devices for Pytorch UT execution.

## Changes
We are adding additional hooks to:
1. Add dtype exceptions for Gaudi/HPU
2. Extend onlyNativeDevices decorator  functionality to add additional devices

Pull Request resolved: pytorch#128584
Approved by: https://github.com/albanD
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
## Motivation
This is follow up to PR:pytorch#126970  to support Gaudi devices for Pytorch UT execution.

## Changes
We are adding additional hooks to:
1. Add dtype exceptions for Gaudi/HPU
2. Extend onlyNativeDevices decorator  functionality to add additional devices

Pull Request resolved: pytorch#128584
Approved by: https://github.com/albanD
pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2024
## Motivation
This is follow up to PR:#126970, adding facility to run content for Intel Gaudi devices.
We intend to extend similar generalization for the rest of the content in test/dynamo  which is currently being written to work specifically for cuda devices. Other devices can add onto it if support is available.

## Changes
 carve out bert related content to another class
 use instantiate_device_type utility to instantiate this class for devices which support the functionality

Pull Request resolved: #130714
Approved by: https://github.com/anijain2305
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
## Motivation
This is follow up to PR:pytorch#126970, adding facility to run content for Intel Gaudi devices.
We intend to extend similar generalization for the rest of the content in test/dynamo  which is currently being written to work specifically for cuda devices. Other devices can add onto it if support is available.

## Changes
 carve out bert related content to another class
 use instantiate_device_type utility to instantiate this class for devices which support the functionality

Pull Request resolved: pytorch#130714
Approved by: https://github.com/anijain2305
pytorchmergebot pushed a commit that referenced this pull request Oct 16, 2024
… tests on HPU as well (#133975)

**MOTIVATION**

We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at #126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

**CHANGES**

- Add support for HPU devices within the payload function.
- Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances.
- Expand the supported_activities() function to include checks for torch.profiler.ProfilerActivity.HPU.
- Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices.

Pull Request resolved: #133975
Approved by: https://github.com/briancoutinho, https://github.com/aaronenyeshi
pytorchmergebot pushed a commit that referenced this pull request Oct 29, 2024
**MOTIVATION**

We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at #126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

**CHANGES**
- Add support for HPU devices within the test_move_exported_model_bn using TEST_HPU flag
- Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances.
- Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices.
Pull Request resolved: #137863
Approved by: https://github.com/jerryzh168
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
**MOTIVATION**

We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at pytorch#126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

**CHANGES**
- Add support for HPU devices within the test_move_exported_model_bn using TEST_HPU flag
- Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances.
- Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices.
Pull Request resolved: pytorch#137863
Approved by: https://github.com/jerryzh168
pytorchmergebot pushed a commit that referenced this pull request Jan 23, 2025
**MOTIVATION**

We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at #126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

Other accelerators can also extend the functionality by adding the device in the devices list. ( For eg: xpu )

**CHANGES**

Create a separate class for test functions running on CUDA devices
Extend the functionality of these tests to include HPUs
Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances within the new classes
Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices

Previously we had submitted some changes in #140131 . However, deleted that PR due to merge conflicts and other issues.

Pull Request resolved: #144387
Approved by: https://github.com/ankurneog, https://github.com/EikanWang, https://github.com/yanboliang, https://github.com/guangyey
pytorchmergebot pushed a commit that referenced this pull request Apr 4, 2025
This PR is related to #145476 . That PR had two files (test_functions.py and test_misc.py) . test_functions was causing CI/rebase/merge issues and hence removed for now. This PR contains only test_misc.py.

This is a continuation of #144387 .

## MOTIVATION
We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at #126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

Other accelerators can also extend the functionality by adding the device in the devices list. ( For eg: xpu )

## CHANGES
Create a separate class for test functions running on CUDA devices
Extend the functionality of these tests to include HPUs
Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances within the new classes
Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices

PS: Most of these changes were initially part of #147609 , but closed that PR due to merge conflicts. The review comments were handled in this PR.

Pull Request resolved: #149499
Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/cyyever
timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
This PR is related to pytorch#145476 . That PR had two files (test_functions.py and test_misc.py) . test_functions was causing CI/rebase/merge issues and hence removed for now. This PR contains only test_misc.py.

This is a continuation of pytorch#144387 .

## MOTIVATION
We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at pytorch#126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

Other accelerators can also extend the functionality by adding the device in the devices list. ( For eg: xpu )

## CHANGES
Create a separate class for test functions running on CUDA devices
Extend the functionality of these tests to include HPUs
Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances within the new classes
Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices

PS: Most of these changes were initially part of pytorch#147609 , but closed that PR due to merge conflicts. The review comments were handled in this PR.

Pull Request resolved: pytorch#149499
Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/cyyever
amathewc added a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
This PR is related to pytorch#145476 . That PR had two files (test_functions.py and test_misc.py) . test_functions was causing CI/rebase/merge issues and hence removed for now. This PR contains only test_misc.py.

This is a continuation of pytorch#144387 .

## MOTIVATION
We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at pytorch#126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

Other accelerators can also extend the functionality by adding the device in the devices list. ( For eg: xpu )

## CHANGES
Create a separate class for test functions running on CUDA devices
Extend the functionality of these tests to include HPUs
Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances within the new classes
Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices

PS: Most of these changes were initially part of pytorch#147609 , but closed that PR due to merge conflicts. The review comments were handled in this PR.

Pull Request resolved: pytorch#149499
Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/cyyever
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants