Skip to content

Conversation

@daisyden
Copy link
Collaborator

@daisyden daisyden commented Aug 27, 2025

For #114850, we will port distributed tests to Intel GPU. This PR is created base on PR #158533 and #159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR:

  1. add allow_xpu=True in instantiate_device_type_tests() if needed.

  2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend

  3. enabled XPU for some test path

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

daisyden added 28 commits July 30, 2025 12:59
…evice arg is None. Revised the _initialized checking in test_store.py and test_c10d_common.py
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161601

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8171d24 with merge base 73eb451 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

nn.AdaptiveAvgPool2d(output_size=(1, 1)),
nn.Flatten(),
)
self.device = torch.cuda.current_device()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.device is unused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I found it is unused.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!



device_type = acc.type if (acc := torch.accelerator.current_accelerator()) else "cpu"
device_count = torch.accelerator.device_count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link
Collaborator

@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@guangyey guangyey requested a review from d4l3k September 1, 2025 08:24
@guangyey guangyey changed the title [WIP] [3/N] Enable 6 fsdp test on Intel GPU [3/N] Enable 6 fsdp test on Intel GPU Sep 1, 2025
@guangyey guangyey moved this to Review Required in PyTorch Intel Sep 1, 2025
Copy link
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daisyden
Copy link
Collaborator Author

daisyden commented Sep 5, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/161601/head returned non-zero exit code 1

Rebasing (1/20)
Auto-merging test/distributed/test_c10d_common.py
CONFLICT (content): Merge conflict in test/distributed/test_c10d_common.py
Auto-merging test/distributed/test_c10d_functional_native.py
CONFLICT (content): Merge conflict in test/distributed/test_c10d_functional_native.py
Auto-merging test/distributed/test_device_mesh.py
CONFLICT (content): Merge conflict in test/distributed/test_device_mesh.py
Auto-merging test/distributed/test_dynamo_distributed.py
CONFLICT (content): Merge conflict in test/distributed/test_dynamo_distributed.py
Auto-merging test/distributed/test_inductor_collectives.py
CONFLICT (content): Merge conflict in test/distributed/test_inductor_collectives.py
Auto-merging test/distributed/test_store.py
CONFLICT (content): Merge conflict in test/distributed/test_store.py
Auto-merging test/distributions/test_distributions.py
Auto-merging torch/distributed/distributed_c10d.py
CONFLICT (content): Merge conflict in torch/distributed/distributed_c10d.py
Auto-merging torch/testing/_internal/common_distributed.py
CONFLICT (content): Merge conflict in torch/testing/_internal/common_distributed.py
Auto-merging torch/testing/_internal/common_fsdp.py
Auto-merging torch/testing/_internal/distributed/fake_pg.py
CONFLICT (content): Merge conflict in torch/testing/_internal/distributed/fake_pg.py
error: could not apply 06f62e36356... port sevearl test files under test/distributed to Intel GPU
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 06f62e36356... # port sevearl test files under test/distributed to Intel GPU

Raised by https://github.com/pytorch/pytorch/actions/runs/17481559107

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Sep 5, 2025
@daisyden
Copy link
Collaborator Author

daisyden commented Sep 6, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 6, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-project-automation github-project-automation bot moved this from Review Required to Done in PyTorch Intel Sep 6, 2025
daisyden added a commit to daisyden/pytorch that referenced this pull request Sep 8, 2025
For pytorch#114850, we will port distributed tests to Intel GPU. This PR is created base on PR pytorch#158533 and pytorch#159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR:

1. add allow_xpu=True in instantiate_device_type_tests() if needed.
2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend

3. enabled XPU for some test path

Pull Request resolved: pytorch#161601
Approved by: https://github.com/guangyey, https://github.com/d4l3k
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
For pytorch#114850, we will port distributed tests to Intel GPU. This PR is created base on PR pytorch#158533 and pytorch#159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR:

1. add allow_xpu=True in instantiate_device_type_tests() if needed.
2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend

3. enabled XPU for some test path

Pull Request resolved: pytorch#161601
Approved by: https://github.com/guangyey, https://github.com/d4l3k
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
For pytorch#114850, we will port distributed tests to Intel GPU. This PR is created base on PR pytorch#158533 and pytorch#159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR:

1. add allow_xpu=True in instantiate_device_type_tests() if needed.
2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend

3. enabled XPU for some test path

Pull Request resolved: pytorch#161601
Approved by: https://github.com/guangyey, https://github.com/d4l3k
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
For pytorch#114850, we will port distributed tests to Intel GPU. This PR is created base on PR pytorch#158533 and pytorch#159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR:

1. add allow_xpu=True in instantiate_device_type_tests() if needed.
2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend

3. enabled XPU for some test path

Pull Request resolved: pytorch#161601
Approved by: https://github.com/guangyey, https://github.com/d4l3k
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
For pytorch#114850, we will port distributed tests to Intel GPU. This PR is created base on PR pytorch#158533 and pytorch#159473 and will work on some test files under test/distributed/fsdp. We could enable Intel GPU with following methods and try the best to keep the original code styles in this PR:

1. add allow_xpu=True in instantiate_device_type_tests() if needed.
2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend

3. enabled XPU for some test path

Pull Request resolved: pytorch#161601
Approved by: https://github.com/guangyey, https://github.com/d4l3k
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/h100-symm-mem ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks keep-going Don't stop on first failure, keep running tests until the end Merged module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (fsdp) release notes category topic: not user facing topic category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants