-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
tensor.pin_memory() always asks for a context on the current device. This means that even if you use torch.device('cuda:1') everywhere in the program, a simple DataLoader(..., pin_memory=True) will create a context on GPU 0.
A little dig into cudaHostAlloc and our THCCachingHostAllocator tells me that:
- We allocate pinned memory with
cudaHostAlloc(ptr, size, cudaHostAllocDefault). - Such allocated pointers can be directly used by any device, regardless of the current device at the time of allocation, since we assume unified addressing.
Therefore, I wonder, instead of always asking for a context on the current device, if tensor.pin_memory() should just grab any CUDA context if exists.
@colesbury pointed out that many other functions also create context on current device. But I think they are not as frequent as this.
Metadata
Metadata
Assignees
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module