Skip to content

[BE][8/N] Remove ShardedTensor from TP FSDP integration test and other tests depending on Sharded Linear#96254

Closed
fduwjj wants to merge 3 commits intogh/fduwjj/81/basefrom
gh/fduwjj/81/head
Closed

[BE][8/N] Remove ShardedTensor from TP FSDP integration test and other tests depending on Sharded Linear#96254
fduwjj wants to merge 3 commits intogh/fduwjj/81/basefrom
gh/fduwjj/81/head

Conversation

@fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Mar 8, 2023

Stack from ghstack (oldest at bottom):

We removed ShardedLinear in #95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96254

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Failures

As of commit 32f1858:

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on master:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added topic: not user facing topic category labels Mar 8, 2023
@fduwjj fduwjj added the release notes: distributed (sharded) release notes category label Mar 8, 2023
@fduwjj fduwjj added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 8, 2023
@fduwjj fduwjj requested a review from huydhn March 8, 2023 01:14
@huydhn huydhn added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Mar 8, 2023
@fduwjj
Copy link
Contributor Author

fduwjj commented Mar 8, 2023

Looks like the failure about dynamo benchmark is not related to this PR "No CUDA GPUs are available"

We removed ShardedLinear in #95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit.


[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Mar 8, 2023
@fduwjj fduwjj changed the title [BE][8/N] Remove ShardedTensor from TP FSDP integration test [BE][8/N] Remove ShardedTensor from TP FSDP integration test and other tests depending on Sharded Linear Mar 8, 2023
@huydhn
Copy link
Contributor

huydhn commented Mar 8, 2023

The new periodic multigpu failure https://hud.pytorch.org/pr/96254#11842511229 also look related, so I guess it's another test to be update

…st and other tests depending on Sharded Linear"


We removed ShardedLinear in #95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit.


[ghstack-poisoned]
@fduwjj
Copy link
Contributor Author

fduwjj commented Mar 8, 2023

@huydhn Ahhh that's no wonder why I didn't see it. We need to remove that test too. We have one already for DTensor under test/distributed/. Removed that test too.

fduwjj added a commit that referenced this pull request Mar 8, 2023
Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! You might see buck failure on periodic, but it's broken in trunk at the moment. So the failure is expected.

@fduwjj
Copy link
Contributor Author

fduwjj commented Mar 8, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: periodic / linux-bionic-cuda11.7-py3.9-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

@fduwjj
Copy link
Contributor Author

fduwjj commented Mar 8, 2023

@pytorchbot merge -f "failing tests are not related to this PR."

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented Mar 9, 2023

@fduwjj I have a small follow-up PR #96431 to cleanup reference to the deleted test. The multigpu test complains expectedly that it couldn't find the test.

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 12, 2023
…r tests depending on Sharded Linear (#96254)

We removed ShardedLinear in pytorch/pytorch#95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit.

Pull Request resolved: pytorch/pytorch#96254
Approved by: https://github.com/huydhn
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 12, 2023
…r tests depending on Sharded Linear (#96254)

We removed ShardedLinear in pytorch/pytorch#95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit.

Pull Request resolved: pytorch/pytorch#96254
Approved by: https://github.com/huydhn
ydwu4 added a commit to ydwu4/pytorch that referenced this pull request Mar 13, 2023
…r tests depending on Sharded Linear (pytorch#96254)

We removed ShardedLinear in pytorch#95948 but it broke TP_FSDP integration test because it is using ShardedTensor in the test. Migrating using DTensor fixes the test. DTensor shards the bias too so that we need to change the test a little bit.

Pull Request resolved: pytorch#96254
Approved by: https://github.com/huydhn
@facebook-github-bot facebook-github-bot deleted the gh/fduwjj/81/head branch June 8, 2023 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (sharded) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants