Skip to content

DTensor fast path: port return_and_correct_aliasing and inplace/out checks#167475

Closed
swolchok wants to merge 11 commits intogh/swolchok/867/basefrom
gh/swolchok/867/head
Closed

DTensor fast path: port return_and_correct_aliasing and inplace/out checks#167475
swolchok wants to merge 11 commits intogh/swolchok/867/basefrom
gh/swolchok/867/head

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Nov 10, 2025

…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167475

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c96e842 with merge base 780e325 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Nov 10, 2025
swolchok added a commit that referenced this pull request Nov 10, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: 5c51428
Pull Request resolved: #167475
@swolchok swolchok marked this pull request as draft November 10, 2025 18:30
@swolchok swolchok added the release notes: distributed (dtensor) release notes category label Nov 10, 2025
…h: port return_and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 10, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: a73ea43
Pull Request resolved: #167475
…rect_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 10, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: 2b1895f
Pull Request resolved: #167475
…t arguments for local dispatch, and failure to return a list (was pushing multiple retvals onto stack) for list returning ops on "WIP: DTensor fast path: port return_and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 11, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: 7481ee3
Pull Request resolved: #167475
… path: port return_and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 11, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: 865c3fa
Pull Request resolved: #167475
…rn_and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 11, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: b6ec8c8
Pull Request resolved: #167475
…eturn_and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 11, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: d7e0e1d
Pull Request resolved: #167475
@swolchok swolchok marked this pull request as ready for review November 11, 2025 20:52
@swolchok swolchok changed the title WIP: DTensor fast path: port return_and_correct_aliasing and inplace/out checks DTensor fast path: port return_and_correct_aliasing and inplace/out checks Nov 11, 2025
@swolchok swolchok requested review from XilunWu and wconstab November 11, 2025 23:05
@swolchok swolchok added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 11, 2025
…and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 12, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: 7f18116
Pull Request resolved: #167475
@ezyang ezyang requested a review from bdhirsh November 12, 2025 03:50
# simple analysis of function schema to determine
# if this is an inplace variant, it might not
# be entirely correct, but it's good enough for now.
return self.op._schema.name[-1] == "_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just putting it back since we don't need the standalone function anymore (see changes in _dispatch.py)

}
}
stack->clear();
return wrapped_result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably possible to do better now that it's in C++, but FUTURE WORK.

…comments on "DTensor fast path: port return_and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 12, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: 1595782
Pull Request resolved: #167475
…ensor on "DTensor fast path: port return_and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 12, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: 362cf52
Pull Request resolved: #167475
…ort return_and_correct_aliasing and inplace/out checks"

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 13, 2025
…out checks

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

ghstack-source-id: 7467e23
Pull Request resolved: #167475
@swolchok
Copy link
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
…hecks (pytorch#167475)

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

Pull Request resolved: pytorch#167475
Approved by: https://github.com/ezyang
ghstack dependencies: pytorch#167051, pytorch#166372, pytorch#166808
pytorchmergebot pushed a commit that referenced this pull request Nov 21, 2025
```
git revert --no-commit 567dcdb 200156e 3d801a4 2034ca9 480b4ff f570e58
```

    And Revert "[DTensor] Document fast-path dispatch (#168192)"
    And Revert "[DTensor] Fix deadlock after fast cache clear (#168069)"

Reverts:
* #167860
* #167588
* #167475
* #166808
* #166372
* #168192
* #168069

Signed-off-by: Edward Z. Yang <[email protected]>

Pull Request resolved: #168264
Approved by: https://github.com/seemethere, https://github.com/malfet
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
```
git revert --no-commit 567dcdb 200156e 3d801a4 2034ca9 480b4ff f570e58
```

    And Revert "[DTensor] Document fast-path dispatch (#168192)"
    And Revert "[DTensor] Fix deadlock after fast cache clear (#168069)"

Reverts:
* #167860
* #167588
* #167475
* #166808
* #166372
* #168192
* #168069

Signed-off-by: Edward Z. Yang <[email protected]>

Pull Request resolved: #168264
Approved by: https://github.com/seemethere, https://github.com/malfet
@github-actions github-actions bot deleted the gh/swolchok/867/head branch December 14, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants