Closed
Conversation
Summary: Adding an out variant for `permute`. It's better than fixing the copy inside contiguous because 1) we can leverage the c2 math library, 2) contiguous creates a tensor inside the function which isn't managed by the MemoryPlanner in StaticRuntime Test Plan: Benchmark: ``` After: I1214 12:35:32.218775 991920 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0902339. Iters per second: 11082.3 Before: I1214 12:35:43.368770 992620 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0961521. Iters per second: 10400.2 ``` Differential Revision: D25541666 fbshipit-source-id: 44e74c660d329b3577bf27b167e426e1d7337915
Contributor
💊 CI failures summary and remediationsAs of commit 77ab3f5 (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. This comment has been revised 5 times. |
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D25541666 |
Codecov Report
@@ Coverage Diff @@
## master #49447 +/- ##
==========================================
- Coverage 80.62% 80.62% -0.01%
==========================================
Files 1875 1875
Lines 202769 202769
==========================================
- Hits 163486 163485 -1
- Misses 39283 39284 +1 |
Contributor
|
This pull request has been merged in cbeb4c2. |
hwangdeyu
pushed a commit
to hwangdeyu/pytorch
that referenced
this pull request
Jan 6, 2021
Summary: Pull Request resolved: pytorch#49447 Adding an out variant for `permute`. It's better than fixing the copy inside contiguous because 1) we can leverage the c2 math library, 2) contiguous creates a tensor inside the function which isn't managed by the MemoryPlanner in StaticRuntime Test Plan: Benchmark: ``` After: I1214 12:35:32.218775 991920 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0902339. Iters per second: 11082.3 Before: I1214 12:35:43.368770 992620 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0961521. Iters per second: 10400.2 ``` Reviewed By: yinghai Differential Revision: D25541666 fbshipit-source-id: 013ed0d4080cd01de4d3e1b031ab51e5032e6651
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary: Adding an out variant for
permute. It's better than fixing the copy inside contiguous because 1) we can leverage the c2 math library, 2) contiguous creates a tensor inside the function which isn't managed by the MemoryPlanner in StaticRuntimeTest Plan:
Benchmark:
Differential Revision: D25541666