Skip to content

Conversation

@davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Aug 22, 2022

Stack from ghstack (oldest at bottom):

Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 22, 2022

🔗 Helpful links

❌ 1 New Failures

As of commit b60bfee (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR
Job Step
CircleCI Checks build Unknown

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

davidberard98 added a commit that referenced this pull request Aug 22, 2022
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

ghstack-source-id: c9f07d6
Pull Request resolved: #83857
@davidberard98 davidberard98 requested review from xuzhao9 and removed request for a team and xuzhao9 August 22, 2022 17:46
@davidberard98 davidberard98 marked this pull request as draft August 22, 2022 18:00
@davidberard98 davidberard98 requested a review from xuzhao9 August 22, 2022 21:30
@davidberard98
Copy link
Contributor Author

@xuzhao9 do you know why torchbench is OOMing here? I tried running this on the AWS cluster on A100s but I couldn't repro the OOM issue

@xuzhao9
Copy link
Contributor

xuzhao9 commented Aug 22, 2022

@davidberard98 The runner has 8xNvidia T4 GPUs (each 16GB), not A100, maybe that's the difference?

@davidberard98
Copy link
Contributor Author

@pytorchbot rebase -s

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/davidberard98/141/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/83857)

Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 24, 2022
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

ghstack-source-id: 685b989
Pull Request resolved: #83857
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 24, 2022
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

ghstack-source-id: 397272e
Pull Request resolved: #83857
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 25, 2022
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

ghstack-source-id: a57984d
Pull Request resolved: #83857
@davidberard98 davidberard98 marked this pull request as ready for review August 26, 2022 18:04
@davidberard98 davidberard98 requested a review from a team August 26, 2022 18:04
. "${HOME}"/anaconda3/etc/profile.d/conda.sh
conda activate pr-ci
python3 pytorch/.github/scripts/run_torchbench.py \
# python3 -c "import torch; torch.rand((4, 4), device='cuda')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can remove this? I am also okay to keep and uncomment it, just to make sure the hardware works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching that - I will remove it. It doesn't actually work because pytorch isn't built at this point.

Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 27, 2022
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

ghstack-source-id: 20d277a
Pull Request resolved: #83857
@davidberard98
Copy link
Contributor Author

@pytorchbot rebase -s

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the release notes: releng release notes category label Aug 30, 2022
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/davidberard98/141/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/83857)

pytorchmergebot pushed a commit that referenced this pull request Aug 30, 2022
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

ghstack-source-id: fc150ae
Pull Request resolved: #83857
@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@github-actions
Copy link
Contributor

Hey @davidberard98.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request Sep 1, 2022
Summary:
Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test.

RUN_TORCHBENCH: nvfuser

Pull Request resolved: #83857
Approved by: https://github.com/xuzhao9

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/71d99662a0d7f8a9ad68999c9a014b71591cbb68

Reviewed By: mehtanirav

Differential Revision: D39172015

Pulled By: davidberard98

fbshipit-source-id: 208f7d8bf00937a459bb5abd5baf9461660d19c3
@facebook-github-bot facebook-github-bot deleted the gh/davidberard98/141/head branch September 3, 2022 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants