[DLPACK] Optimize toDLPack Conversion Speed #162111

tqchen · 2025-09-04T00:15:19Z

Previously in gh-83069, the toDLPack converter introduces a normalization step that changes the strides to 1 when shape[i] == 1

This step, however, calls as_strided during toDLPack, and can slow down the toDLPack about 3x. This causes PyTorch's DLPack conversion to be around 0.6 us overhead per call from the < 0.2us.

This PR updates the logic by adding a need_normalize_strides check, to first confirm if the strides normalization is necessary. In most common cases, when the tensor is continguous, such normalization is not necessary.

We confirmed that having this additional step would recover the speed of toDLPack to below 0.2us and can help significantly speedup eager mode integration of DLPack with PyTorch.

If we detect that there is normalization needs, the older path will be invoked.

Fixes #162113

pytorch-bot · 2025-09-04T00:15:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162111

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 71f9d0b with merge base 8ec551b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-09-04T00:15:27Z

The committers listed above are authorized under a signed CLA.

✅ login: tqchen / name: Tianqi Chen (71f9d0b)

tqchen · 2025-09-04T00:20:05Z

Benchmark, on AMD Ryzen:

torch.utils.dlpack.to_dlpack[old]             6.162643432617187e-07 sec/call. 
to_dlpack[this PR]                                1.8970966339111327e-07 sec/call
numpy.__dlpack__                         8.518695831298828e-08 sec/call

tqchen · 2025-09-04T00:29:24Z

cc @mattip @rgommers @albanD @msaroufim

Previously in pytorchgh-83069, the toDLPack converter introduces a normalization step that changes the strides to 1 when shape[i] == 1 This step, however, calls as_strided during toDLPack, and can slow down the toDLPack about 3x. This causes PyTorch's DLPack conversion to be around 0.6 us overhead per call from the < 0.2us. This PR updates the logic by adding a need_normalize_strides check, to first confirm if the strides normalization is necessary. In most common cases, when the tensor is continguous, such normalization is not necessary. We confirmed that having this additional step would recover the speed of toDLPack to below 0.2us and can help significantly speedup eager mode integration of DLPack with PyTorch. If we detect that there is normalization needs, the older path will be invoked.

msaroufim

@pytorchbot merge

msaroufim · 2025-09-04T02:42:58Z

@pytorchbot merge

pytorchmergebot · 2025-09-04T02:47:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Previously in pytorchgh-83069, the toDLPack converter introduces a normalization step that changes the strides to 1 when shape[i] == 1 This step, however, calls as_strided during toDLPack, and can slow down the toDLPack about 3x. This causes PyTorch's DLPack conversion to be around 0.6 us overhead per call from the < 0.2us. This PR updates the logic by adding a need_normalize_strides check, to first confirm if the strides normalization is necessary. In most common cases, when the tensor is continguous, such normalization is not necessary. We confirmed that having this additional step would recover the speed of toDLPack to below 0.2us and can help significantly speedup eager mode integration of DLPack with PyTorch. If we detect that there is normalization needs, the older path will be invoked. Fixes pytorch#162113 Pull Request resolved: pytorch#162111 Approved by: https://github.com/msaroufim

pytorchbot added the open source label Sep 4, 2025

tqchen mentioned this pull request Sep 4, 2025

Normalize DLPack stride to 1 where shape < 2 #83158

Closed

eqy added module: dlpack topic: not user facing topic category labels Sep 4, 2025

eqy requested a review from albanD September 4, 2025 00:31

tqchen force-pushed the dlpack branch from 11331d3 to 71f9d0b Compare September 4, 2025 00:56

msaroufim self-requested a review September 4, 2025 02:36

msaroufim approved these changes Sep 4, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 4, 2025

pytorchmergebot added the merging label Sep 4, 2025

pytorchmergebot added the Merged label Sep 4, 2025

pytorchmergebot closed this in 8906266 Sep 4, 2025

pytorchmergebot removed the merging label Sep 4, 2025

This was referenced Sep 11, 2025

[RFC] Intrusive Caching DLPack for Fast Conversion #162630

Closed

[RFC] Bring up DLPack C Functions for Speedup and Streamline Exchange #162845

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DLPACK] Optimize toDLPack Conversion Speed #162111

[DLPACK] Optimize toDLPack Conversion Speed #162111

Uh oh!

tqchen commented Sep 4, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

tqchen commented Sep 4, 2025 •

edited

Loading

Uh oh!

tqchen commented Sep 4, 2025

Uh oh!

msaroufim left a comment

Uh oh!

msaroufim commented Sep 4, 2025

Uh oh!

pytorchmergebot commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[DLPACK] Optimize toDLPack Conversion Speed #162111

[DLPACK] Optimize toDLPack Conversion Speed #162111

Uh oh!

Conversation

tqchen commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162111

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented Sep 4, 2025

Uh oh!

msaroufim left a comment

Choose a reason for hiding this comment

Uh oh!

msaroufim commented Sep 4, 2025

Uh oh!

pytorchmergebot commented Sep 4, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tqchen commented Sep 4, 2025 •

edited

Loading

pytorch-bot bot commented Sep 4, 2025 •

edited

Loading

linux-foundation-easycla bot commented Sep 4, 2025 •

edited

Loading

tqchen commented Sep 4, 2025 •

edited

Loading