Skip to content

Conversation

@ysiraichi
Copy link
Collaborator

@ysiraichi ysiraichi commented Mar 28, 2025

Stack from ghstack (oldest at bottom):

This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: dl_device and copy.

In summary, we handle these arguments in the C++ implementation of
to_dlpack(...) at torch/csrc/Module.cpp, by calling the
maybeCopyTensor function at aten/src/ATen/DLConvertor.cpp. It also
introduces the following changes:

  • Add a new Python API torchDeviceToDLDevice(), which is simply a
    refactoring of the getDLDevice() function at
    aten/src/ATen/DLConvertor.cpp.
  • Add both keyword-arguments to the from_dlpack() function at
    torch/utils/dlpack.py and to the Tensor.__dlpack__() dunder
    method.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150218

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 65d6b8f with merge base 7cc1a95 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ysiraichi added a commit that referenced this pull request Mar 28, 2025
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

ghstack-source-id: d2b2c14
Pull Request resolved: #150218
@ysiraichi ysiraichi added module: dlpack release notes: python_frontend python frontend release notes category labels Mar 28, 2025
@ysiraichi ysiraichi requested review from albanD and rgommers March 28, 2025 21:15
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

ghstack-source-id: 96bf013
Pull Request resolved: pytorch/pytorch#150218
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ctx.device_id = static_cast<int32_t>(static_cast<unsigned char>(device_id));
switch (tensor.device().type()) {

ctx.device_id = (device.is_cuda() || device.is_privateuseone())
Copy link
Collaborator

@albanD albanD May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this new logic? Why would we ignore the device index being passed in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got it from the place it was being called before. Needed it, since I'm also calling this function outside.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by this?

Copy link
Collaborator Author

@ysiraichi ysiraichi Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this is how we were calling this function (the only call site):

c10::DeviceIndex device_id = 0;
if (src.is_cuda() || src.is_privateuseone()) {
device_id = src.get_device();
}
atDLMTensor->tensor.dl_tensor.device = getDLDevice(src, device_id);

So, I basically, moved that code into this function. In other words, the logic did not change. The device index will always be either src.get_device() (i.e. src.device().index()), if the condition src.is_cuda() || src.is_privateuseone() is true, or 0 otherwise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess it is another case where we should have an issue and tag other accelerator backends.

tensor, at::DLPackTraits<T>::capsule, DLPack_Capsule_Destructor<T>);
}

return nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm don't recall, will this have the right python err set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. At least, it will throw an error inside the parser. I will replace it with Py_RETURN_NONE, just for consistency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I hadn't understood you question earlier. If you are asking whether a C++ exception will be mapped the correct Python error set, the answer is: yes! END_HANDLE_TH_ERRORS will take care of that. Specifically, torch::translate_exception_to_python(std::current_exception()) call will do the job.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning None here is completely different. These are in no way interchangeable.
Returning nullptr from these APIs mean that something went wrong and the caller should check the globally set error for more info. Returning None means that all went well and the result is "None".
You either want one or the other :D
Or maybe you're saying this is dead code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially deadcode.
On second thoughts, it would be better to just TORCH_INTERNAL_ASSERT(r.idx == 0);. Then, there would be no need to return None, since the bug would be in the arg parser.

kwargs["dl_device"] = device

ext_device = ext_tensor.__dlpack_device__()
# ext_device is either CUDA or ROCm, we need to pass the current
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a lot of this string handling should get generalized to any device which is an accelerator in PT.
Ok to do later, but as we're going to see issues with hip or xpu, we should just refactor it together

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand what string handling you are referring to? Doesn't torch.device(device) work with any accelerator in PT?

Copy link
Collaborator

@albanD albanD Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the fact that you do custom processing only for CUDA and ROCm but we have quite a few other accelerators that would require the same treatment in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not familiar with these other accelerators, I think I will leave it for a future PR. In this one, I'm simply adding support for the extra keywords (in this case, passing them through to __dlpack__() method). What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's just open an issue for it and add the xpu and privateuse1 labels on it.

ysiraichi added 7 commits May 24, 2025 11:55
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
@ysiraichi
Copy link
Collaborator Author

Summarizing the current status of this PR:

  1. We are currently waiting for @EikanWang feedback on the reason we need the data pointer in order to get a torch device
  2. TODO: open an issue for XPU and PrivateUse1, regarding stream handling in DLPack
  3. TODO: open an issue, regarding the way we translate torch device into DLPack device

In my opinion, since (I think) I'm not really changing the behavior of these functions (just slightly refactoring them), we could land these changes and open issues as needed.

@albanD do you think it's better for us to wait for (1) before landing this PR?
Let me know what you think.

This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with skipping XPU handling until we hear back from them.
It might be a bit broken though, not sure if we have a way to test it?

if (optional_dl_device.has_value()) {
auto device = at::getATenDevice(
optional_dl_device->device_type,
static_cast<c10::DeviceIndex>(optional_dl_device->device_id));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you pass data in for xpu to work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently, XPU requires access to data within getATenDevice. @gujinghui could we refine this logic?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass the data ptr here to not break xpu? We still need to check the device id upon the data ptr for a long term. Thanks.

Copy link
Collaborator Author

@ysiraichi ysiraichi Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct thing to do, here, though? The problem I see is that the const Tensor& data parameter in this context does not necessarily live in the XPU device. In summary:

  • data is a tensor that, if necessary (e.g. copy=true), must be copied to device = at::getATenDevice()
  • optional_dl_device is something the user specifies when calling tensor.__dlpack__() (likely from a otherlib.from_dlpack(tensor, device = resultShouldLiveHere))

So, I guess, the actual problem here is: how do we retrieve a torch device for XPU? Or, what device should we pass to tensor.to() in order to move a tensor to XPU.


My guess is that the solution here is to branch on XPU device:

c10::Device device;
if (optional_dl_device->device_type == DLDeviceType::kDLOneAPI) {
    device = torch.device(c10::kXPU);
} else {
    device = at::getATenDevice(...);
}

The problem with this is that we will be ignoring the device index, which I'm not sure is ideal. Is there a way to create a tensor on a specific XPU device (accounting for the given index)?

@guangyey @gujinghui
What do you think?

@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Jul 3, 2025
@ysiraichi
Copy link
Collaborator Author

@guangyey @gujinghui @albanD
After adding support for moving tensors onto XPU, I believe we need tests for it. So, I'm leaving it for a future PR.
I will probably try to merge this stack next Wednesday. So, if you think I should wait and fix this first, let me know until then.

@albanD
Copy link
Collaborator

albanD commented Jul 9, 2025

Happy to leave XPU as-is for now and cleanup later yes.

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #150691

This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
This PR introduces the rest of the keyword-arguments added in DLPack
version 2023.12: `dl_device` and `copy`.

In summary, we handle these arguments in the C++ implementation of
`to_dlpack(...)` at _torch/csrc/Module.cpp_, by calling the
`maybeCopyTensor` function at _aten/src/ATen/DLConvertor.cpp_. It also
introduces the following changes:

- Add a new Python API `torchDeviceToDLDevice()`, which is simply a
  refactoring of the `getDLDevice()` function at
  _aten/src/ATen/DLConvertor.cpp_.
- Add both keyword-arguments to the `from_dlpack()` function at
  _torch/utils/dlpack.py_ and to the `Tensor.__dlpack__()` dunder
  method.

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #150691

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants