[DLPack] Add support for missing keyword-arguments. #150218

ysiraichi · 2025-03-28T19:37:19Z

Stack from ghstack (oldest at bottom):

This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: dl_device and copy.

In summary, we handle these arguments in the C++ implementation of
to_dlpack(...) at torch/csrc/Module.cpp, by calling the
maybeCopyTensor function at aten/src/ATen/DLConvertor.cpp. It also
introduces the following changes:

Add a new Python API torchDeviceToDLDevice(), which is simply a
refactoring of the getDLDevice() function at
aten/src/ATen/DLConvertor.cpp.
Add both keyword-arguments to the from_dlpack() function at
torch/utils/dlpack.py and to the Tensor.__dlpack__() dunder
method.

[ghstack-poisoned]

pytorch-bot · 2025-03-28T19:37:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150218

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 65d6b8f with merge base 7cc1a95 ():

NEW FAILURE - The following job has failed:

xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 1, 6, linux.idc.xpu) (gh)
'test/test_transformers.py::TestSDPAXpuOnlyXPU::test_scaled_dot_product_fused_attention_mask_vs_math_fused_kernel1_float16_batch_size_1_n_head_32_q_size_2016_kv_size_2016_head_dim_128_mask_type_float_train_False_xpu_float16'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

This PR introduces the rest of the keyword-arguments added in DLPack version 2023.12: `dl_device` and `copy`. In summary, we handle these arguments in the C++ implementation of `to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the `maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also introduces the following changes: - Add a new Python API `torchDeviceToDLDevice()`, which is simply a refactoring of the `getDLDevice()` function at _aten/src/ATen/DLConvertor.cpp_. - Add both keyword-arguments to the `from_dlpack()` function at _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder method. ghstack-source-id: d2b2c14 Pull Request resolved: #150218

[ghstack-poisoned]

aten/src/ATen/DLConvertor.cpp

This PR introduces the rest of the keyword-arguments added in DLPack version 2023.12: `dl_device` and `copy`. In summary, we handle these arguments in the C++ implementation of `to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the `maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also introduces the following changes: - Add a new Python API `torchDeviceToDLDevice()`, which is simply a refactoring of the `getDLDevice()` function at _aten/src/ATen/DLConvertor.cpp_. - Add both keyword-arguments to the `from_dlpack()` function at _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder method. ghstack-source-id: 96bf013 Pull Request resolved: pytorch/pytorch#150218

[ghstack-poisoned]

aten/src/ATen/DLConvertor.cpp

torch/utils/dlpack.py

torch/csrc/Module.cpp

albanD · 2025-05-14T21:32:57Z

aten/src/ATen/DLConvertor.cpp

-  ctx.device_id = static_cast<int32_t>(static_cast<unsigned char>(device_id));
-  switch (tensor.device().type()) {
+
+  ctx.device_id = (device.is_cuda() || device.is_privateuseone())


Why this new logic? Why would we ignore the device index being passed in?

I just got it from the place it was being called before. Needed it, since I'm also calling this function outside.

Not sure what you mean by this?

Previously, this is how we were calling this function (the only call site):

pytorch/aten/src/ATen/DLConvertor.cpp

Lines 317 to 321 in e1f28fe

c10::DeviceIndex device_id = 0;

if (src.is_cuda() || src.is_privateuseone()) {

device_id = src.get_device();

}

atDLMTensor->tensor.dl_tensor.device = getDLDevice(src, device_id);

So, I basically, moved that code into this function. In other words, the logic did not change. The device index will always be either src.get_device() (i.e. src.device().index()), if the condition src.is_cuda() || src.is_privateuseone() is true, or 0 otherwise.

Ok, I guess it is another case where we should have an issue and tag other accelerator backends.

torch/_tensor.py

albanD · 2025-05-14T21:44:38Z

torch/csrc/Module.cpp

+        tensor, at::DLPackTraits<T>::capsule, DLPack_Capsule_Destructor<T>);
+  }
+
+  return nullptr;


I'm don't recall, will this have the right python err set?

I think so. At least, it will throw an error inside the parser. I will replace it with Py_RETURN_NONE, just for consistency.

I think I hadn't understood you question earlier. If you are asking whether a C++ exception will be mapped the correct Python error set, the answer is: yes! END_HANDLE_TH_ERRORS will take care of that. Specifically, torch::translate_exception_to_python(std::current_exception()) call will do the job.

Returning None here is completely different. These are in no way interchangeable.
Returning nullptr from these APIs mean that something went wrong and the caller should check the globally set error for more info. Returning None means that all went well and the result is "None".
You either want one or the other :D
Or maybe you're saying this is dead code?

This is essentially deadcode.
On second thoughts, it would be better to just TORCH_INTERNAL_ASSERT(r.idx == 0);. Then, there would be no need to return None, since the bug would be in the arg parser.

albanD · 2025-05-14T21:46:12Z

torch/utils/dlpack.py

+        kwargs["dl_device"] = device
+
+        ext_device = ext_tensor.__dlpack_device__()
+        # ext_device is either CUDA or ROCm, we need to pass the current


Note that a lot of this string handling should get generalized to any device which is an accelerator in PT.
Ok to do later, but as we're going to see issues with hip or xpu, we should just refactor it together

Could you help me understand what string handling you are referring to? Doesn't torch.device(device) work with any accelerator in PT?

Only the fact that you do custom processing only for CUDA and ROCm but we have quite a few other accelerators that would require the same treatment in the future.

Since I'm not familiar with these other accelerators, I think I will leave it for a future PR. In this one, I'm simply adding support for the extra keywords (in this case, passing them through to __dlpack__() method). What do you think?

Yes let's just open an issue for it and add the xpu and privateuse1 labels on it.

torch/utils/dlpack.py

[ghstack-poisoned]

This PR introduces the rest of the keyword-arguments added in DLPack version 2023.12: `dl_device` and `copy`. In summary, we handle these arguments in the C++ implementation of `to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the `maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also introduces the following changes: - Add a new Python API `torchDeviceToDLDevice()`, which is simply a refactoring of the `getDLDevice()` function at _aten/src/ATen/DLConvertor.cpp_. - Add both keyword-arguments to the `from_dlpack()` function at _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder method. [ghstack-poisoned]

ysiraichi · 2025-06-21T14:48:04Z

Summarizing the current status of this PR:

We are currently waiting for @EikanWang feedback on the reason we need the data pointer in order to get a torch device
TODO: open an issue for XPU and PrivateUse1, regarding stream handling in DLPack
TODO: open an issue, regarding the way we translate torch device into DLPack device

In my opinion, since (I think) I'm not really changing the behavior of these functions (just slightly refactoring them), we could land these changes and open issues as needed.

@albanD do you think it's better for us to wait for (1) before landing this PR?
Let me know what you think.

This PR introduces the rest of the keyword-arguments added in DLPack version 2023.12: `dl_device` and `copy`. In summary, we handle these arguments in the C++ implementation of `to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the `maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also introduces the following changes: - Add a new Python API `torchDeviceToDLDevice()`, which is simply a refactoring of the `getDLDevice()` function at _aten/src/ATen/DLConvertor.cpp_. - Add both keyword-arguments to the `from_dlpack()` function at _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder method. [ghstack-poisoned]

albanD

I'm ok with skipping XPU handling until we hear back from them.
It might be a bit broken though, not sure if we have a way to test it?

albanD · 2025-07-03T09:01:30Z

aten/src/ATen/DLConvertor.cpp

+  if (optional_dl_device.has_value()) {
+    auto device = at::getATenDevice(
+        optional_dl_device->device_type,
+        static_cast<c10::DeviceIndex>(optional_dl_device->device_id));


Shouldn't you pass data in for xpu to work?

Yes, currently, XPU requires access to data within getATenDevice. @gujinghui could we refine this logic?

Can we pass the data ptr here to not break xpu? We still need to check the device id upon the data ptr for a long term. Thanks.

Is this the correct thing to do, here, though? The problem I see is that the const Tensor& data parameter in this context does not necessarily live in the XPU device. In summary:

data is a tensor that, if necessary (e.g. copy=true), must be copied to device = at::getATenDevice()

optional_dl_device is something the user specifies when calling tensor.__dlpack__() (likely from a otherlib.from_dlpack(tensor, device = resultShouldLiveHere))

So, I guess, the actual problem here is: how do we retrieve a torch device for XPU? Or, what device should we pass to tensor.to() in order to move a tensor to XPU.

My guess is that the solution here is to branch on XPU device:

c10::Device device; if (optional_dl_device->device_type == DLDeviceType::kDLOneAPI) { device = torch.device(c10::kXPU); } else { device = at::getATenDevice(...); }

The problem with this is that we will be ignoring the device index, which I'm not sure is ideal. Is there a way to create a tensor on a specific XPU device (accounting for the given index)?

@guangyey @gujinghui
What do you think?

ysiraichi · 2025-07-04T16:08:20Z

@guangyey @gujinghui @albanD
After adding support for moving tensors onto XPU, I believe we need tests for it. So, I'm leaving it for a future PR.
I will probably try to merge this stack next Wednesday. So, if you think I should wait and fix this first, let me know until then.

albanD · 2025-07-09T11:26:51Z

Happy to leave XPU as-is for now and cleanup later yes.

pytorchmergebot · 2025-07-12T13:35:03Z

Starting merge as part of PR stack under #150691

This PR introduces the rest of the keyword-arguments added in DLPack version 2023.12: `dl_device` and `copy`. In summary, we handle these arguments in the C++ implementation of `to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the `maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also introduces the following changes: - Add a new Python API `torchDeviceToDLDevice()`, which is simply a refactoring of the `getDLDevice()` function at _aten/src/ATen/DLConvertor.cpp_. - Add both keyword-arguments to the `from_dlpack()` function at _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder method. [ghstack-poisoned]

pytorchmergebot · 2025-07-19T20:48:31Z

Starting merge as part of PR stack under #150691

This PR addresses the Array API documentation for [`__dlpack__`][1] and [`from_dlpack`][2] by making some buffer-related errors `BufferError` instead of `RuntimeError`, e.g. incompatible dtype, strides, or device. [1]: https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html [2]: https://data-apis.org/array-api/latest/API_specification/generated/array_api.from_dlpack.html#from-dlpack Pull Request resolved: #150691 Approved by: https://github.com/Skylion007, https://github.com/albanD ghstack dependencies: #150216, #150217, #150218

Update

04c86e5

[ghstack-poisoned]

This was referenced Mar 28, 2025

Upgrade to DLPack 1.0. #145000

Closed

[DLPack] add NumPy exchange tests. #150216

Closed

Fix DLPack stream logic. #150217

Closed

pytorchbot added the open source label Mar 28, 2025

Update

f6ea564

[ghstack-poisoned]

Rebased.

d41b50f

[ghstack-poisoned]

Update

766633f

[ghstack-poisoned]

ysiraichi added module: dlpack release notes: python_frontend python frontend release notes category labels Mar 28, 2025

ysiraichi requested review from albanD and rgommers March 28, 2025 21:15

Update

e18064c

[ghstack-poisoned]

Update

da86675

[ghstack-poisoned]

ysiraichi mentioned this pull request Apr 4, 2025

Raise BufferError for DLPack buffer-related errors. #150691

Closed

Update

dd14029

[ghstack-poisoned]

Skylion007 reviewed Apr 8, 2025

View reviewed changes

aten/src/ATen/DLConvertor.cpp Outdated Show resolved Hide resolved

ysiraichi added 3 commits April 25, 2025 20:30

Update

8447c50

[ghstack-poisoned]

Update

266fddc

[ghstack-poisoned]

Update

128ea01

[ghstack-poisoned]

msaroufim reviewed May 14, 2025

View reviewed changes

aten/src/ATen/DLConvertor.cpp Outdated Show resolved Hide resolved

albanD reviewed May 14, 2025

View reviewed changes

ysiraichi added 7 commits May 24, 2025 11:55

Update

95feed5

[ghstack-poisoned]

Rebased.

af02e7c

[ghstack-poisoned]

Update

3d0129b

[ghstack-poisoned]

albanD approved these changes Jul 3, 2025

View reviewed changes

guangyey added the ciflow/xpu Run XPU CI tasks label Jul 3, 2025

ysiraichi added 4 commits July 12, 2025 16:22

pytorchmergebot added the Merged label Jul 20, 2025

pytorchmergebot closed this in a10f157 Jul 20, 2025

github-actions bot deleted the gh/ysiraichi/86/head branch August 19, 2025 02:16

	c10::DeviceIndex device_id = 0;
	if (src.is_cuda() \|\| src.is_privateuseone()) {
	device_id = src.get_device();
	}
	atDLMTensor->tensor.dl_tensor.device = getDLDevice(src, device_id);

[DLPack] Add support for missing keyword-arguments. #150218

[DLPack] Add support for missing keyword-arguments. #150218

Uh oh!

Conversation

ysiraichi commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150218

❌ 1 New Failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albanD May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysiraichi Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ysiraichi commented Jun 21, 2025

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysiraichi Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysiraichi commented Jul 4, 2025

Uh oh!

albanD commented Jul 9, 2025

Uh oh!

pytorchmergebot commented Jul 12, 2025

Uh oh!

pytorchmergebot commented Jul 19, 2025

ysiraichi commented Mar 28, 2025 •

edited

Loading

pytorch-bot bot commented Mar 28, 2025 •

edited

Loading

albanD May 14, 2025 •

edited

Loading

ysiraichi Jun 20, 2025 •

edited

Loading

albanD Jun 16, 2025 •

edited

Loading

ysiraichi Jul 4, 2025 •

edited

Loading