-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[DLPACK] Optimize toDLPack Conversion Speed #162111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162111
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 71f9d0b with merge base 8ec551b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
|
Benchmark, on AMD Ryzen: |
Previously in pytorchgh-83069, the toDLPack converter introduces a normalization step that changes the strides to 1 when shape[i] == 1 This step, however, calls as_strided during toDLPack, and can slow down the toDLPack about 3x. This causes PyTorch's DLPack conversion to be around 0.6 us overhead per call from the < 0.2us. This PR updates the logic by adding a need_normalize_strides check, to first confirm if the strides normalization is necessary. In most common cases, when the tensor is continguous, such normalization is not necessary. We confirmed that having this additional step would recover the speed of toDLPack to below 0.2us and can help significantly speedup eager mode integration of DLPack with PyTorch. If we detect that there is normalization needs, the older path will be invoked.
msaroufim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytorchbot merge
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Previously in pytorchgh-83069, the toDLPack converter introduces a normalization step that changes the strides to 1 when shape[i] == 1 This step, however, calls as_strided during toDLPack, and can slow down the toDLPack about 3x. This causes PyTorch's DLPack conversion to be around 0.6 us overhead per call from the < 0.2us. This PR updates the logic by adding a need_normalize_strides check, to first confirm if the strides normalization is necessary. In most common cases, when the tensor is continguous, such normalization is not necessary. We confirmed that having this additional step would recover the speed of toDLPack to below 0.2us and can help significantly speedup eager mode integration of DLPack with PyTorch. If we detect that there is normalization needs, the older path will be invoked. Fixes pytorch#162113 Pull Request resolved: pytorch#162111 Approved by: https://github.com/msaroufim
Previously in pytorchgh-83069, the toDLPack converter introduces a normalization step that changes the strides to 1 when shape[i] == 1 This step, however, calls as_strided during toDLPack, and can slow down the toDLPack about 3x. This causes PyTorch's DLPack conversion to be around 0.6 us overhead per call from the < 0.2us. This PR updates the logic by adding a need_normalize_strides check, to first confirm if the strides normalization is necessary. In most common cases, when the tensor is continguous, such normalization is not necessary. We confirmed that having this additional step would recover the speed of toDLPack to below 0.2us and can help significantly speedup eager mode integration of DLPack with PyTorch. If we detect that there is normalization needs, the older path will be invoked. Fixes pytorch#162113 Pull Request resolved: pytorch#162111 Approved by: https://github.com/msaroufim
Previously in pytorchgh-83069, the toDLPack converter introduces a normalization step that changes the strides to 1 when shape[i] == 1 This step, however, calls as_strided during toDLPack, and can slow down the toDLPack about 3x. This causes PyTorch's DLPack conversion to be around 0.6 us overhead per call from the < 0.2us. This PR updates the logic by adding a need_normalize_strides check, to first confirm if the strides normalization is necessary. In most common cases, when the tensor is continguous, such normalization is not necessary. We confirmed that having this additional step would recover the speed of toDLPack to below 0.2us and can help significantly speedup eager mode integration of DLPack with PyTorch. If we detect that there is normalization needs, the older path will be invoked. Fixes pytorch#162113 Pull Request resolved: pytorch#162111 Approved by: https://github.com/msaroufim
Previously in gh-83069, the toDLPack converter introduces a normalization step that changes the strides to 1 when shape[i] == 1
This step, however, calls as_strided during toDLPack, and can slow down the toDLPack about 3x. This causes PyTorch's DLPack conversion to be around 0.6 us overhead per call from the < 0.2us.
This PR updates the logic by adding a need_normalize_strides check, to first confirm if the strides normalization is necessary. In most common cases, when the tensor is continguous, such normalization is not necessary.
We confirmed that having this additional step would recover the speed of toDLPack to below 0.2us and can help significantly speedup eager mode integration of DLPack with PyTorch.
If we detect that there is normalization needs, the older path will be invoked.
Fixes #162113