[pipelining] lazy shape inference for stage #130856

H-Huang · 2024-07-16T20:43:24Z

Stack from ghstack (oldest at bottom):

-> [pipelining] lazy shape inference for stage #130856

cc @XilunWu @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

[ghstack-poisoned]

pytorch-bot · 2024-07-16T20:43:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130856

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job

As of commit 9a83705 with merge base c549629 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_functorch/_aot_autograd/subclass_utils.py:
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 3, 3, linux.8xlarge.nvidia.gpu) (gh)
'test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass0'

CANCELLED JOB - The following job was cancelled. Please retry:

Check Labels (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: b42ebcf Pull Request resolved: #130856

github-actions · 2024-09-14T23:34:04Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

wconstab · 2024-09-17T22:57:13Z

torch/distributed/pipelining/schedules.py

+        for stage in self._stages:
+            if not stage.buffers_initialized:
+                logger.debug("init_buffers for %s", stage.stage_index)
+                stage.init_buffers(self._n_microbatches, args_split[0], kwargs_split[0])


is it safe to call init_buffers one stage at a time for looped schedules? i guess, this would work for 'looped' schedules but fail for V-shaped schedules? bc stage.init_buffers will block until stage+1.init_buffers is called, right?

Avoid allocating memory or dry-running the submodule during stage init. Save user-provided input/output metadata during stage init, to allow lazily initializing the buffers before the first step call. Later, we plan to build on top of this to add lazy shape inference (#130856) so that no input/output shapes are required at stage init. For now, we require input/output tensors for stage init, but these should be on meta device and stage should not allocate any real memory. Note: this needs more thorough testing and review, but it worked on the torchtitan 3d test. TODO: - delete 'device' arg from PipelineStage ctor? (move it to inferred from args tensors passed to first step call? separate PR. - delete 'output_args' from PipelineStage ctor? we don't actually need it, but we use it to do shape validation, which is why I didn't remove it in this PR. Proposal: leave it until we add lazy shape inference? Fixes #136225, #136226 ghstack-source-id: 8a359b5 Pull Request resolved: #136243

Avoid allocating memory or dry-running the submodule during stage init. Save user-provided input/output metadata during stage init, to allow lazily initializing the buffers before the first step call. Later, we plan to build on top of this to add lazy shape inference (#130856) so that no input/output shapes are required at stage init. For now, we require input/output tensors for stage init, but these should be on meta device and stage should not allocate any real memory. Note: this needs more thorough testing and review, but it worked on the torchtitan 3d test. TODO: - delete 'device' arg from PipelineStage ctor? (move it to inferred from args tensors passed to first step call? separate PR. - delete 'output_args' from PipelineStage ctor? we don't actually need it, but we use it to do shape validation, which is why I didn't remove it in this PR. Proposal: leave it until we add lazy shape inference? Fixes #136225, #136226 ghstack-source-id: 0a452fc Pull Request resolved: #136243

Avoid allocating memory or dry-running the submodule during stage init. Save user-provided input/output metadata during stage init, to allow lazily initializing the buffers before the first step call. Later, we plan to build on top of this to add lazy shape inference (#130856) so that no input/output shapes are required at stage init. For now, we require input/output tensors for stage init, but these should be on meta device and stage should not allocate any real memory. Note: this needs more thorough testing and review, but it worked on the torchtitan 3d test. TODO: - delete 'device' arg from PipelineStage ctor? (move it to inferred from args tensors passed to first step call? separate PR. - delete 'output_args' from PipelineStage ctor? we don't actually need it, but we use it to do shape validation, which is why I didn't remove it in this PR. Proposal: leave it until we add lazy shape inference? Fixes #136225, #136226 ghstack-source-id: 955df68 Pull Request resolved: #136243

Avoid allocating memory or dry-running the submodule during stage init. Save user-provided input/output metadata during stage init, to allow lazily initializing the buffers before the first step call. Later, we plan to build on top of this to add lazy shape inference (#130856) so that no input/output shapes are required at stage init. For now, we require input/output tensors for stage init, but these should be on meta device and stage should not allocate any real memory. Note: this needs more thorough testing and review, but it worked on the torchtitan 3d test. TODO: - delete 'device' arg from PipelineStage ctor? (move it to inferred from args tensors passed to first step call? separate PR. - delete 'output_args' from PipelineStage ctor? we don't actually need it, but we use it to do shape validation, which is why I didn't remove it in this PR. Proposal: leave it until we add lazy shape inference? Fixes #136225, #136226 ghstack-source-id: b05610d Pull Request resolved: #136243

Avoid allocating memory or dry-running the submodule during stage init. Save user-provided input/output metadata during stage init, to allow lazily initializing the buffers before the first step call. Later, we plan to build on top of this to add lazy shape inference (#130856) so that no input/output shapes are required at stage init. For now, we require input/output tensors for stage init, but these should be on meta device and stage should not allocate any real memory. Note: this needs more thorough testing and review, but it worked on the torchtitan 3d test. TODO: - delete 'device' arg from PipelineStage ctor? (move it to inferred from args tensors passed to first step call? separate PR. - delete 'output_args' from PipelineStage ctor? we don't actually need it, but we use it to do shape validation, which is why I didn't remove it in this PR. Proposal: leave it until we add lazy shape inference? Fixes #136225, #136226 Pull Request resolved: #136243 Approved by: https://github.com/H-Huang, https://github.com/kwen2501

[pipelining] lazy shape inference for stage

9a83705

[ghstack-poisoned]

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jul 16, 2024

H-Huang added a commit that referenced this pull request Jul 16, 2024

[pipelining] lazy shape inference for stage

8caec92

ghstack-source-id: b42ebcf Pull Request resolved: #130856

github-actions bot added the Stale label Sep 14, 2024

H-Huang added no-stale and removed Stale labels Sep 15, 2024

wconstab reviewed Sep 17, 2024

View reviewed changes

This was referenced Sep 17, 2024

[pipelining] try not to dry run module when creating PipelineStage? #136226

Open

[Pipelining] Make PipelineStage support meta initialization #136243

Closed

wconstab mentioned this pull request Sep 27, 2024

[rfc] [pipelining] shape inference + cached buffer allocation #136811

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pipelining] lazy shape inference for stage #130856

[pipelining] lazy shape inference for stage #130856

Uh oh!

H-Huang commented Jul 16, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 16, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Sep 14, 2024

Uh oh!

wconstab Sep 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[pipelining] lazy shape inference for stage #130856

Are you sure you want to change the base?

[pipelining] lazy shape inference for stage #130856

Uh oh!

Conversation

H-Huang commented Jul 16, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130856

❌ 2 New Failures, 1 Cancelled Job

Uh oh!

github-actions bot commented Sep 14, 2024

Uh oh!

wconstab Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

H-Huang commented Jul 16, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 16, 2024 •

edited

Loading