DTensor fast path: port return_and_correct_aliasing and inplace/out checks#167475
DTensor fast path: port return_and_correct_aliasing and inplace/out checks#167475swolchok wants to merge 11 commits intogh/swolchok/867/basefrom
Conversation
…out checks This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167475
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit c96e842 with merge base 780e325 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…h: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…rect_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…t arguments for local dispatch, and failure to return a list (was pushing multiple retvals onto stack) for list returning ops on "WIP: DTensor fast path: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
… path: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…rn_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…eturn_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
| # simple analysis of function schema to determine | ||
| # if this is an inplace variant, it might not | ||
| # be entirely correct, but it's good enough for now. | ||
| return self.op._schema.name[-1] == "_" |
There was a problem hiding this comment.
just putting it back since we don't need the standalone function anymore (see changes in _dispatch.py)
| } | ||
| } | ||
| stack->clear(); | ||
| return wrapped_result; |
There was a problem hiding this comment.
This is probably possible to do better now that it's in C++, but FUTURE WORK.
…comments on "DTensor fast path: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…ensor on "DTensor fast path: port return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…ort return_and_correct_aliasing and inplace/out checks" This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…hecks (pytorch#167475) This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using. Pull Request resolved: pytorch#167475 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#167051, pytorch#166372, pytorch#166808
``` git revert --no-commit 567dcdb 200156e 3d801a4 2034ca9 480b4ff f570e58 ``` And Revert "[DTensor] Document fast-path dispatch (#168192)" And Revert "[DTensor] Fix deadlock after fast cache clear (#168069)" Reverts: * #167860 * #167588 * #167475 * #166808 * #166372 * #168192 * #168069 Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: #168264 Approved by: https://github.com/seemethere, https://github.com/malfet
``` git revert --no-commit 567dcdb 200156e 3d801a4 2034ca9 480b4ff f570e58 ``` And Revert "[DTensor] Document fast-path dispatch (#168192)" And Revert "[DTensor] Fix deadlock after fast cache clear (#168069)" Reverts: * #167860 * #167588 * #167475 * #166808 * #166372 * #168192 * #168069 Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: #168264 Approved by: https://github.com/seemethere, https://github.com/malfet
Stack from ghstack (oldest at bottom):
DTensor.__torch_dispatch__#167051This seems to generate a several-microsecond performance improvement in the detach benchmark I've been using.
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci