DTensor fast path: port return_and_correct_aliasing and inplace/out checks by swolchok · Pull Request #167475 · pytorch/pytorch

swolchok · 2025-11-10T18:29:35Z

Stack from ghstack (oldest at bottom):

This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. [ghstack-poisoned]

pytorch-bot · 2025-11-10T18:29:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167475

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c96e842 with merge base 780e325 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable) (gh) (#166072)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_int8_static_quant_recipe

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: 5c51428 Pull Request resolved: #167475

…h: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: a73ea43 Pull Request resolved: #167475

…rect_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: 2b1895f Pull Request resolved: #167475

…t arguments for local dispatch, and failure to return a list (was pushing multiple retvals onto stack) for list returning ops on "WIP: DTensor fast path: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: 7481ee3 Pull Request resolved: #167475

… path: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: 865c3fa Pull Request resolved: #167475

…rn_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: b6ec8c8 Pull Request resolved: #167475

…eturn_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: d7e0e1d Pull Request resolved: #167475

…and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: 7f18116 Pull Request resolved: #167475

ezyang · 2025-11-12T03:51:42Z

torch/distributed/tensor/_op_schema.py

+        # simple analysis of function schema to determine
+        # if this is an inplace variant, it might not
+        # be entirely correct, but it's good enough for now.
+        return self.op._schema.name[-1] == "_"


What happened here?

just putting it back since we don't need the standalone function anymore (see changes in _dispatch.py)

ezyang · 2025-11-12T03:52:39Z

torch/csrc/autograd/python_variable.cpp

+      }
+    }
+    stack->clear();
+    return wrapped_result;


This is probably possible to do better now that it's in C++, but FUTURE WORK.

…comments on "DTensor fast path: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: 1595782 Pull Request resolved: #167475

…ensor on "DTensor fast path: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: 362cf52 Pull Request resolved: #167475

…ort return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. ghstack-source-id: 7467e23 Pull Request resolved: #167475

swolchok · 2025-11-13T06:03:27Z

@pytorchbot merge -i

pytorchmergebot · 2025-11-13T06:05:21Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…hecks (pytorch#167475) This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. Pull Request resolved: pytorch#167475 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#167051, pytorch#166372, pytorch#166808

``` git revert --no-commit 567dcdb 200156e 3d801a4 2034ca9 480b4ff f570e58 ``` And Revert "[DTensor] Document fast-path dispatch (#168192)" And Revert "[DTensor] Fix deadlock after fast cache clear (#168069)" Reverts: * #167860 * #167588 * #167475 * #166808 * #166372 * #168192 * #168069 Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: #168264 Approved by: https://github.com/seemethere, https://github.com/malfet

WIP: DTensor fast path: port return_and_correct_aliasing and inplace/…

6d56f09

…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. [ghstack-poisoned]

swolchok requested review from albanD and soulitzer as code owners November 10, 2025 18:29

This was referenced Nov 10, 2025

Add C++ fast path for DTensor.__torch_dispatch__ #167051

Closed

Avoid creating Python OpSchema in the DTensor dispatch fast path #166372

Closed

swolchok mentioned this pull request Nov 8, 2025

extend C++ DTensor fast path to local operator dispatch #166808

Closed

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Nov 10, 2025

swolchok marked this pull request as draft November 10, 2025 18:30

swolchok added the release notes: distributed (dtensor) release notes category label Nov 10, 2025

swolchok mentioned this pull request Nov 11, 2025

Use _StridedShard to replace shard_order field in DTensorSpec #167300

Open

swolchok marked this pull request as ready for review November 11, 2025 20:52

swolchok changed the title ~~WIP: DTensor fast path: port return_and_correct_aliasing and inplace/out checks~~ DTensor fast path: port return_and_correct_aliasing and inplace/out checks Nov 11, 2025

swolchok requested review from XilunWu and wconstab November 11, 2025 23:05

swolchok added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 11, 2025

ezyang requested a review from bdhirsh November 12, 2025 03:50

ezyang reviewed Nov 12, 2025

View reviewed changes

ezyang approved these changes Nov 12, 2025

View reviewed changes

pytorchmergebot added the merging label Nov 13, 2025

pytorchmergebot added the Merged label Nov 13, 2025

pytorchmergebot closed this in 3d801a4 Nov 13, 2025

pytorchmergebot removed the merging label Nov 13, 2025

ezyang mentioned this pull request Nov 20, 2025

Revert C++ fastpath dispatch path for DTensor #168264

Closed

github-actions bot deleted the gh/swolchok/867/head branch December 14, 2025 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DTensor fast path: port return_and_correct_aliasing and inplace/out checks#167475

DTensor fast path: port return_and_correct_aliasing and inplace/out checks#167475
swolchok wants to merge 11 commits intogh/swolchok/867/basefrom
gh/swolchok/867/head

swolchok commented Nov 10, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

ezyang Nov 12, 2025

Uh oh!

swolchok Nov 12, 2025

Uh oh!

ezyang Nov 12, 2025

Uh oh!

swolchok commented Nov 13, 2025

Uh oh!

pytorchmergebot commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

swolchok commented Nov 10, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167475

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

ezyang Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

swolchok Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

swolchok commented Nov 13, 2025

Uh oh!

pytorchmergebot commented Nov 13, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

swolchok commented Nov 10, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 10, 2025 •

edited

Loading