port 3 distributed test to Intel GPU and unified some common functions #158533

daisyden · 2025-07-17T03:35:47Z

For #114850, we will port distributed tests to Intel GPU.
We could enable Intel GPU with following methods and try the best to keep the original code styles:

instantiate_device_type_tests()
use "torch.accelerator.current_accelerator()" to determine the accelerator backend
enabled XPU for some test path
Unify some common code under torch/testing/_internal for multiple backend, for example:
- requires_nccl_version
- _dynamo_dist_per_rank_init
- DynamoDistributedSingleProcTestCase
- DistTestCases
- FSDPTestMultiThread

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @gujinghui @EikanWang @fengyuan14 @guangyey

pytorch-bot · 2025-07-17T03:35:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158533

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 32111ce with merge base 9d37c96 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 4, 6, linux.idc.xpu) (gh) (disabled by #160243 but the issue was closed recently and a rebase is needed to make it pass)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_xpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

daisyden · 2025-07-17T12:49:57Z

@pytorchbot label "module: xpu"
@pytorchbot label "triaged"

torch/testing/_internal/common_distributed.py

guangyey · 2025-07-23T08:32:37Z

torch/testing/_internal/common_distributed.py

+        backend = c10d.get_default_backend_for_device(
+            torch.accelerator.current_accelerator().type
+        )


Suggested change

backend = c10d.get_default_backend_for_device(

torch.accelerator.current_accelerator().type

)

device_type = acc.type if (acc := torch.accelerator.current_accelerator()) else "cpu"

backend = c10d.get_default_backend_for_device(device_type)

Otherwise, torch.accelerator.current_accelerator() will return None if no accelerator detected.

Updated, thanks!

guangyey

One nit, otherwise LGTM.

test/distributed/test_functional_api.py

Please help check the CI failure.

guangyey

Thanks for the update!

guangyey · 2025-07-24T02:41:29Z

@pytorchbot rebase

pytorchmergebot · 2025-08-12T02:00:43Z

Successfully rebased daisyden/dist_upstream_s1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout daisyden/dist_upstream_s1 && git pull --rebase)

guangyey · 2025-08-12T05:21:02Z

@daisyden Please fix the lint error.

guangyey · 2025-08-13T02:34:25Z

@pytorchbot merge

pytorchmergebot · 2025-08-13T02:36:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

#158533) For #114850, we will port distributed tests to Intel GPU. We could enable Intel GPU with following methods and try the best to keep the original code styles: - instantiate_device_type_tests() - use "torch.accelerator.current_accelerator()" to determine the accelerator backend - enabled XPU for some test path - Unify some common code under torch/testing/_internal for multiple backend, for example: - requires_nccl_version - _dynamo_dist_per_rank_init - DynamoDistributedSingleProcTestCase - DistTestCases - FSDPTestMultiThread Pull Request resolved: #158533 Approved by: https://github.com/guangyey, https://github.com/d4l3k Co-authored-by: Yu, Guangye <[email protected]>

pytorch#158533) For pytorch#114850, we will port distributed tests to Intel GPU. We could enable Intel GPU with following methods and try the best to keep the original code styles: - instantiate_device_type_tests() - use "torch.accelerator.current_accelerator()" to determine the accelerator backend - enabled XPU for some test path - Unify some common code under torch/testing/_internal for multiple backend, for example: - requires_nccl_version - _dynamo_dist_per_rank_init - DynamoDistributedSingleProcTestCase - DistTestCases - FSDPTestMultiThread Pull Request resolved: pytorch#158533 Approved by: https://github.com/guangyey, https://github.com/d4l3k Co-authored-by: Yu, Guangye <[email protected]>

For #114850, we will port distributed tests to Intel GPU. This PR is created base on PR #158533 and #159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR: 1. add allow_xpu=True in instantiate_device_type_tests() if needed. 2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend 3. enabled XPU for some test path Pull Request resolved: #161601 Approved by: https://github.com/guangyey, https://github.com/d4l3k

For pytorch#114850, we will port distributed tests to Intel GPU. This PR is created base on PR pytorch#158533 and pytorch#159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR: 1. add allow_xpu=True in instantiate_device_type_tests() if needed. 2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend 3. enabled XPU for some test path Pull Request resolved: pytorch#161601 Approved by: https://github.com/guangyey, https://github.com/d4l3k

pytorch#158533) For pytorch#114850, we will port distributed tests to Intel GPU. We could enable Intel GPU with following methods and try the best to keep the original code styles: - instantiate_device_type_tests() - use "torch.accelerator.current_accelerator()" to determine the accelerator backend - enabled XPU for some test path - Unify some common code under torch/testing/_internal for multiple backend, for example: - requires_nccl_version - _dynamo_dist_per_rank_init - DynamoDistributedSingleProcTestCase - DistTestCases - FSDPTestMultiThread Pull Request resolved: pytorch#158533 Approved by: https://github.com/guangyey, https://github.com/d4l3k Co-authored-by: Yu, Guangye <[email protected]>

For pytorch#114850, we will port distributed tests to Intel GPU. This PR is created base on PR pytorch#158533 and pytorch#159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR: 1. add allow_xpu=True in instantiate_device_type_tests() if needed. 2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend 3. enabled XPU for some test path Pull Request resolved: pytorch#161601 Approved by: https://github.com/guangyey, https://github.com/d4l3k

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Jul 17, 2025

daisyden changed the title ~~port 3 distributed test to Intel GPU and unified some common functions~~ [WIP] port 3 distributed test to Intel GPU and unified some common functions Jul 17, 2025

pytorchbot added the open source label Jul 17, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 17, 2025

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 17, 2025

etaf added the ciflow/xpu Run XPU CI tasks label Jul 17, 2025

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 17, 2025

pytorch-bot bot added the module: xpu Intel XPU related issues label Jul 17, 2025

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 17, 2025

guangyey added this to PyTorch Intel Jul 17, 2025

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 18, 2025

jingxu10 added the ciflow/xpu Run XPU CI tasks label Jul 18, 2025

guangyey reviewed Jul 21, 2025

View reviewed changes

torch/testing/_internal/common_distributed.py Outdated Show resolved Hide resolved

guangyey reviewed Jul 21, 2025

View reviewed changes

torch/testing/_internal/common_distributed.py Outdated Show resolved Hide resolved

guangyey reviewed Jul 21, 2025

View reviewed changes

torch/testing/_internal/common_distributed.py Outdated Show resolved Hide resolved

guangyey reviewed Jul 21, 2025

View reviewed changes

torch/testing/_internal/common_distributed.py Outdated Show resolved Hide resolved

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Jul 23, 2025

guangyey reviewed Jul 23, 2025

View reviewed changes

guangyey previously approved these changes Jul 23, 2025

View reviewed changes

guangyey changed the title ~~[WIP] port 3 distributed test to Intel GPU and unified some common functions~~ port 3 distributed test to Intel GPU and unified some common functions Jul 23, 2025

guangyey reviewed Jul 23, 2025

View reviewed changes

test/distributed/test_functional_api.py Show resolved Hide resolved

guangyey moved this to Review Required in PyTorch Intel Jul 23, 2025

guangyey approved these changes Jul 23, 2025

View reviewed changes

Chao1Han mentioned this pull request Jul 24, 2025

Device agnostic for DCP #158337

Closed

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 24, 2025

daisyden and others added 5 commits August 12, 2025 02:00

refine code according to comments

d953928

use multi-gpu key

da1be6d

consider cpu when use torch.accelerator to get device type

d01f990

Update test/distributed/test_functional_api.py

3468730

fix lint issue

7d7531e

pytorchmergebot force-pushed the daisyden/dist_upstream_s1 branch from 323ada3 to 7d7531e Compare August 12, 2025 02:00

daisyden added 2 commits August 13, 2025 02:17

Merge branch 'daisyden/upstream_rebase' into daisyden/dist_upstream_s1

2741512

fix lint issue

32111ce

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 13, 2025

pytorchmergebot added the merging label Aug 13, 2025

pytorchmergebot added the Merged label Aug 13, 2025

pytorchmergebot closed this in 6e8865f Aug 13, 2025

github-project-automation bot moved this from Review Required to Done in PyTorch Intel Aug 13, 2025

pytorchmergebot removed the merging label Aug 13, 2025

daisyden mentioned this pull request Aug 27, 2025

[3/N] Enable 6 fsdp test on Intel GPU #161601

Closed

port 3 distributed test to Intel GPU and unified some common functions #158533

port 3 distributed test to Intel GPU and unified some common functions #158533

Uh oh!

Conversation

daisyden commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158533

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

daisyden commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guangyey Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

daisyden Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

guangyey commented Jul 24, 2025

Uh oh!

pytorchmergebot commented Aug 12, 2025

Uh oh!

guangyey commented Aug 12, 2025

Uh oh!

guangyey commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

daisyden commented Jul 17, 2025 •

edited

Loading

pytorch-bot bot commented Jul 17, 2025 •

edited

Loading