Skip to content

Conversation

@amathewc
Copy link
Contributor

MOTIVATION

We recently integrated support for Intel Gaudi devices (identified as 'hpu') into the common_device_type framework via the pull request at #126970. This integration allows tests to be automatically instantiated for Gaudi devices upon loading the relevant library. Building on this development, the current pull request extends the utility of these hooks by adapting selected CUDA tests to operate on Gaudi devices. Additionally, we have confirmed that these modifications do not interfere with the existing tests on CUDA devices.

CHANGES

  • Add support for HPU devices within the payload function.
  • Use instantiate_device_type_tests with targeted attributes to generate device-specific test instances.
  • Expand the supported_activities() function to include checks for torch.profiler.ProfilerActivity.HPU.
  • Apply skipIfHPU decorator to bypass tests that are not yet compatible with HPU devices.

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133975

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c87edef with merge base 28a521e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Aug 20, 2024
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 21, 2024
@amathewc
Copy link
Contributor Author

Hi @sraikund16 :
Just wanted to remind you about this PR. Please let me know if you need any additional information or changes from my side.
Thanks!

@aaronenyeshi
Copy link
Member

Could this diff be stacked on top of this refactor? #134316

@amathewc
Copy link
Contributor Author

Could this diff be stacked on top of this refactor? #134316

@aaronenyeshi : It seems like the PR mentioned (#134316 ) provides support of in-tree devices. Since Intel Gaudi (HPU) is an out-of-tree device, we needed a different approach . Please refer to this #126970 as well.
Adding @jgong5 and @ankurneog for further comments.

@ankurneog
Copy link

@aaronenyeshi : i agree with @amathewc , we can give the flexibility per file/module. Each device based on the its capability can include itself per module /file (eg :

)
we should merge both the changes inline so that its useful for out-of-tree devices such as Intel Gaudi.

@aaronenyeshi aaronenyeshi requested a review from sanrise September 3, 2024 15:13
@aaronenyeshi
Copy link
Member

Sounds good, cc @sanrise , @briancoutinho , @shengfukevin - please take a look. The changes are related to the execution trace test.

@amathewc
Copy link
Contributor Author

Sounds good, cc @sanrise , @briancoutinho , @shengfukevin - please take a look. The changes are related to the execution trace test.

@aaronenyeshi , @sanrise , @briancoutinho , @shengfukevin : Any update on this ? any further changes needed from our side?

@briancoutinho
Copy link
Contributor

Looks great, just some minor suggestions. @amathewc can you rebase on viablestrict branch, that should ensure all the checks will be passing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@amathewc
Copy link
Contributor Author

@aaronenyeshi , @sanrise , @briancoutinho , @shengfukevin : Removed the extra lines as per review comments.

@amathewc
Copy link
Contributor Author

@shengfukevin , @sanrise , @sraikund16 : Any further changes required from our side for merging this PR ?

@amathewc
Copy link
Contributor Author

@pytorchbot rebase

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 25, 2024

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

@sraikund16
Copy link
Contributor

@amathewc Can you rebase/merge using git and push to the branch to start testing

@sraikund16
Copy link
Contributor

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased profiler_test_updates onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout profiler_test_updates && git pull --rebase)

@amathewc
Copy link
Contributor Author

@aaronenyeshi , @sanrise , @briancoutinho , @shengfukevin, @sraikund16 : could you help in rebase/merge this ? Looks like I do not have the permissions.
The CI failures are in unrelated files.

@amathewc
Copy link
Contributor Author

@pytorchbot merge

@jgong5
Copy link
Collaborator

jgong5 commented Oct 12, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased profiler_test_updates onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout profiler_test_updates && git pull --rebase)

@amathewc
Copy link
Contributor Author

@aaronenyeshi , @sanrise , @briancoutinho , @shengfukevin, @sraikund16 , @jgong5 : Could you help in merging this ? I don't seem to be authorized to merge this. Have fixed all lint related issues.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 16, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'merger' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@ankurneog
Copy link

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ankurneog
Copy link

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased profiler_test_updates onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout profiler_test_updates && git pull --rebase)

@ankurneog
Copy link

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants