[3/N] Enable 6 fsdp test on Intel GPU #161601

daisyden · 2025-08-27T08:45:21Z

For #114850, we will port distributed tests to Intel GPU. This PR is created base on PR #158533 and #159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR:

add allow_xpu=True in instantiate_device_type_tests() if needed.
use "torch.accelerator.current_accelerator()" to determine the accelerator backend
enabled XPU for some test path

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

…aisyden/distributed_s2

…evice arg is None. Revised the _initialized checking in test_store.py and test_c10d_common.py

…aisyden/distributed_s2

…issue when world size is 4

…/pytorch into daisyden/distributed_s2

…n/pytorch into daisyden/distributed_s2

… XPU

…aisyden/distributed_s2

…aisyden/distributed_s3

pytorch-bot · 2025-08-27T08:45:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161601

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8171d24 with merge base 73eb451 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey · 2025-08-27T09:11:28Z

test/distributed/fsdp/test_fsdp_freezing_weights.py

            nn.AdaptiveAvgPool2d(output_size=(1, 1)),
            nn.Flatten(),
        )
-        self.device = torch.cuda.current_device()


self.device is unused?

yes, I found it is unused.

…aisyden/distributed_s3

guangyey · 2025-09-01T08:15:32Z

test/distributed/test_device_mesh.py



+device_type = acc.type if (acc := torch.accelerator.current_accelerator()) else "cpu"
+device_count = torch.accelerator.device_count()


guangyey

LGTM.

d4l3k

LGTM

daisyden · 2025-09-05T02:04:19Z

@pytorchbot rebase

pytorchmergebot · 2025-09-05T02:05:45Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-09-05T02:05:46Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/161601/head returned non-zero exit code 1

Rebasing (1/20)
Auto-merging test/distributed/test_c10d_common.py
CONFLICT (content): Merge conflict in test/distributed/test_c10d_common.py
Auto-merging test/distributed/test_c10d_functional_native.py
CONFLICT (content): Merge conflict in test/distributed/test_c10d_functional_native.py
Auto-merging test/distributed/test_device_mesh.py
CONFLICT (content): Merge conflict in test/distributed/test_device_mesh.py
Auto-merging test/distributed/test_dynamo_distributed.py
CONFLICT (content): Merge conflict in test/distributed/test_dynamo_distributed.py
Auto-merging test/distributed/test_inductor_collectives.py
CONFLICT (content): Merge conflict in test/distributed/test_inductor_collectives.py
Auto-merging test/distributed/test_store.py
CONFLICT (content): Merge conflict in test/distributed/test_store.py
Auto-merging test/distributions/test_distributions.py
Auto-merging torch/distributed/distributed_c10d.py
CONFLICT (content): Merge conflict in torch/distributed/distributed_c10d.py
Auto-merging torch/testing/_internal/common_distributed.py
CONFLICT (content): Merge conflict in torch/testing/_internal/common_distributed.py
Auto-merging torch/testing/_internal/common_fsdp.py
Auto-merging torch/testing/_internal/distributed/fake_pg.py
CONFLICT (content): Merge conflict in torch/testing/_internal/distributed/fake_pg.py
error: could not apply 06f62e36356... port sevearl test files under test/distributed to Intel GPU
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 06f62e36356... # port sevearl test files under test/distributed to Intel GPU

Raised by https://github.com/pytorch/pytorch/actions/runs/17481559107

…aisyden/distributed_s3

daisyden · 2025-09-06T14:04:27Z

@pytorchbot merge

pytorchmergebot · 2025-09-06T14:07:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

For pytorch#114850, we will port distributed tests to Intel GPU. This PR is created base on PR pytorch#158533 and pytorch#159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR: 1. add allow_xpu=True in instantiate_device_type_tests() if needed. 2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend 3. enabled XPU for some test path Pull Request resolved: pytorch#161601 Approved by: https://github.com/guangyey, https://github.com/d4l3k

daisyden added 28 commits July 30, 2025 12:59

port sevearl test files under test/distributed to Intel GPU

06f62e3

resolve conflict

40c2575

fix issues exposed in cuda and cpu backend test

1e51121

fix some issues detected in CI

40df11b

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into d…

7e190cd

…aisyden/distributed_s2

fix conflict

740b271

fix lint issue

94c57bd

only add xpu backend when xpu is available in register_backend when d…

bc56799

…evice arg is None. Revised the _initialized checking in test_store.py and test_c10d_common.py

Do not port unsupported test

66b7b9f

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into d…

686b49e

…aisyden/distributed_s2

recover test_2d_mesh_eager_init_subgroup

fe42807

revert world size of DeviceMeshTestNDim to avoid test_device_mesh_nd …

41faad2

…issue when world size is 4

Merge branch 'daisyden/upstream_rebase' into daisyden/distributed_s2

38162cd

Merge branch 'daisyden/distributed_s2' of https://github.com/daisyden…

42202b2

…/pytorch into daisyden/distributed_s2

revert the updates for world_size

82daea7

Merge branch 'daisyden/distributed_s2' of https://github.com/daisyden…

e8eebdf

…/pytorch into daisyden/distributed_s2

revert updates for world_size, continued

f3caf7f

fix lint issue

daf3833

Merge branch 'daisyden/upstream_rebase' of https://github.com/daisyde…

2fcc0c9

…n/pytorch into daisyden/distributed_s2

fix lint issue of torch/testing/_internal/common_distributed.py

b9fea14

remove expectedFailureXPU for passed cases when fake_pg registered on…

c01c93e

… XPU

refine requires_nccl_version with lambda function

44b99ee

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into d…

6154dcd

…aisyden/distributed_s2

fix a merge typo

0a1e065

refine code according to review comments

789b66e

fix lint issue

1a993df

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into d…

f0929f4

…aisyden/distributed_s3

ported 6 fsdp cases to Intel GPU

e4f7761

pytorch-bot bot added the ciflow/h100-symm-mem label Aug 27, 2025

guangyey reviewed Aug 27, 2025

View reviewed changes

daisyden added 3 commits August 28, 2025 06:54

fix lint issue

c29e073

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into d…

e22afe2

…aisyden/distributed_s3

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into HEAD

8f6ade2

guangyey reviewed Sep 1, 2025

View reviewed changes

guangyey approved these changes Sep 1, 2025

View reviewed changes

guangyey requested a review from d4l3k September 1, 2025 08:24

guangyey changed the title ~~[WIP] [3/N] Enable 6 fsdp test on Intel GPU~~ [3/N] Enable 6 fsdp test on Intel GPU Sep 1, 2025

guangyey added this to PyTorch Intel Sep 1, 2025

guangyey moved this to Review Required in PyTorch Intel Sep 1, 2025

d4l3k approved these changes Sep 2, 2025

View reviewed changes

Merge remote-tracking branch 'origin/daisyden/upstream_rebase' into d…

8171d24

…aisyden/distributed_s3

pytorch-bot bot added the topic: not user facing topic category label Sep 5, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 6, 2025

pytorchmergebot added the merging label Sep 6, 2025

pytorchmergebot added the Merged label Sep 6, 2025

pytorchmergebot closed this in ae0edc1 Sep 6, 2025

github-project-automation bot moved this from Review Required to Done in PyTorch Intel Sep 6, 2025

pytorchmergebot removed the merging label Sep 6, 2025



		device_type = acc.type if (acc := torch.accelerator.current_accelerator()) else "cpu"
		device_count = torch.accelerator.device_count()

[3/N] Enable 6 fsdp test on Intel GPU #161601

[3/N] Enable 6 fsdp test on Intel GPU #161601

Uh oh!

Conversation

daisyden commented Aug 27, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161601

✅ No Failures

Uh oh!

guangyey Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

daisyden Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

guangyey Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

guangyey Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

guangyey left a comment

Choose a reason for hiding this comment

Uh oh!

d4l3k left a comment

Choose a reason for hiding this comment

Uh oh!

daisyden commented Sep 5, 2025

Uh oh!

pytorchmergebot commented Sep 5, 2025

Uh oh!

pytorchmergebot commented Sep 5, 2025

Uh oh!

daisyden commented Sep 6, 2025

Uh oh!

pytorchmergebot commented Sep 6, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

daisyden commented Aug 27, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 27, 2025 •

edited

Loading