[BE][8/N] Remove ShardedTensor from TP FSDP integration test and other tests depending on Sharded Linear by fduwjj · Pull Request #96254 · pytorch/pytorch

fduwjj · 2023-03-08T01:10:59Z

Stack from ghstack (oldest at bottom):

-> [BE][8/N] Remove ShardedTensor from TP FSDP integration test and other tests depending on Sharded Linear #96254

We removed ShardedLinear in #95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit.

[ghstack-poisoned]

pytorch-bot · 2023-03-08T01:11:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96254

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Failures

As of commit 32f1858:

NEW FAILURES - The following jobs have failed:

linux-bionic-cuda11.7-py3.9-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu) (gh)

FLAKY - The following jobs failed but were likely due to flakiness present on master:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fduwjj · 2023-03-08T04:30:54Z

Looks like the failure about dynamo benchmark is not related to this PR "No CUDA GPUs are available"

We removed ShardedLinear in #95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit. [ghstack-poisoned]

ghstack-source-id: 5e15a75 Pull Request resolved: #96254

test/distributed/_shard/sharding_plan/test_sharding_plan.py

huydhn · 2023-03-08T17:10:09Z

The new periodic multigpu failure https://hud.pytorch.org/pr/96254#11842511229 also look related, so I guess it's another test to be update

…st and other tests depending on Sharded Linear" We removed ShardedLinear in #95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit. [ghstack-poisoned]

fduwjj · 2023-03-08T17:18:46Z

@huydhn Ahhh that's no wonder why I didn't see it. We need to remove that test too. We have one already for DTensor under test/distributed/. Removed that test too.

ghstack-source-id: 6cd3d41 Pull Request resolved: #96254

huydhn

LGTM! You might see buck failure on periodic, but it's broken in trunk at the moment. So the failure is expected.

fduwjj · 2023-03-08T17:28:16Z

@pytorchbot merge

pytorchmergebot · 2023-03-08T17:31:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-03-08T20:04:02Z

Merge failed

Reason: 1 jobs have failed, first few of them are: periodic / linux-bionic-cuda11.7-py3.9-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

fduwjj · 2023-03-08T21:54:53Z

@pytorchbot merge -f "failing tests are not related to this PR."

pytorchmergebot · 2023-03-08T21:56:34Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2023-03-09T18:35:11Z

@fduwjj I have a small follow-up PR #96431 to cleanup reference to the deleted test. The multigpu test complains expectedly that it couldn't find the test.

This test has been deleted in #96254 Pull Request resolved: #96431 Approved by: https://github.com/clee2000, https://github.com/fduwjj

…r tests depending on Sharded Linear (#96254) We removed ShardedLinear in pytorch/pytorch#95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit. Pull Request resolved: pytorch/pytorch#96254 Approved by: https://github.com/huydhn

…r tests depending on Sharded Linear (pytorch#96254) We removed ShardedLinear in pytorch#95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit. Pull Request resolved: pytorch#96254 Approved by: https://github.com/huydhn

[BE][8/N] Remove ShardedTensor from TP FSDP integration test

2aae537

[ghstack-poisoned]

fduwjj requested review from H-Huang, awgu, fegin, kwen2501, mrshenli, rohan-varma, wanchaol and zhaojuanmao as code owners March 8, 2023 01:10

pytorch-bot bot added topic: not user facing topic category labels Mar 8, 2023

fduwjj added the release notes: distributed (sharded) release notes category label Mar 8, 2023

fduwjj mentioned this pull request Mar 8, 2023

[6/N][BE] Remove Sharded Linear Op for ShardedTensor #95948

Closed

fduwjj added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 8, 2023

fduwjj requested a review from huydhn March 8, 2023 01:14

huydhn added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Mar 8, 2023

fduwjj added a commit that referenced this pull request Mar 8, 2023

[BE][8/N] Remove ShardedTensor from TP FSDP integration test

9f3f046

ghstack-source-id: 5e15a75 Pull Request resolved: #96254

fduwjj changed the title ~~[BE][8/N] Remove ShardedTensor from TP FSDP integration test~~ [BE][8/N] Remove ShardedTensor from TP FSDP integration test and other tests depending on Sharded Linear Mar 8, 2023

fduwjj commented Mar 8, 2023

View reviewed changes

test/distributed/_shard/sharding_plan/test_sharding_plan.py Show resolved Hide resolved

fduwjj added a commit that referenced this pull request Mar 8, 2023

[BE][8/N] Remove ShardedTensor from TP FSDP integration test

9c187dd

ghstack-source-id: 6cd3d41 Pull Request resolved: #96254

huydhn approved these changes Mar 8, 2023

View reviewed changes

pytorchmergebot added the Merged label Mar 8, 2023

pytorchmergebot closed this in 7863efb Mar 8, 2023

huydhn added a commit to huydhn/pytorch that referenced this pull request Mar 9, 2023

Clean up references to test_megatron_prototype after pytorch#96254

2654eb4

huydhn mentioned this pull request Mar 9, 2023

Clean up references to test_megatron_prototype #96431

Closed

facebook-github-bot deleted the gh/fduwjj/81/head branch June 8, 2023 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BE][8/N] Remove ShardedTensor from TP FSDP integration test and other tests depending on Sharded Linear#96254

[BE][8/N] Remove ShardedTensor from TP FSDP integration test and other tests depending on Sharded Linear#96254
fduwjj wants to merge 3 commits intogh/fduwjj/81/basefrom
gh/fduwjj/81/head

fduwjj commented Mar 8, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading

Uh oh!

fduwjj commented Mar 8, 2023

Uh oh!

Uh oh!

huydhn commented Mar 8, 2023

Uh oh!

fduwjj commented Mar 8, 2023

Uh oh!

huydhn left a comment

Uh oh!

fduwjj commented Mar 8, 2023

Uh oh!

pytorchmergebot commented Mar 8, 2023

Uh oh!

pytorchmergebot commented Mar 8, 2023

Uh oh!

fduwjj commented Mar 8, 2023

Uh oh!

pytorchmergebot commented Mar 8, 2023

Uh oh!

huydhn commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fduwjj commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96254

❌ 3 Failures

Uh oh!

fduwjj commented Mar 8, 2023

Uh oh!

Uh oh!

huydhn commented Mar 8, 2023

Uh oh!

fduwjj commented Mar 8, 2023

Uh oh!

huydhn left a comment

Choose a reason for hiding this comment

Uh oh!

fduwjj commented Mar 8, 2023

Uh oh!

pytorchmergebot commented Mar 8, 2023

Merge started

Uh oh!

pytorchmergebot commented Mar 8, 2023

Merge failed

Uh oh!

fduwjj commented Mar 8, 2023

Uh oh!

pytorchmergebot commented Mar 8, 2023

Merge started

Uh oh!

huydhn commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fduwjj commented Mar 8, 2023 •

edited

Loading

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading