>>> import torch
>>> a = torch.rand(5).cuda(1)
>>> b = torch.rand(5).cuda(1)
>>> c = torch.cat([a,b], 0)
>>> c.get_device()
0
>>> torch.cuda.set_device(1)
>>> c = torch.cat([a,b], 0)
>>> c.get_device()
1
This is a problem for nn.DataParallel and similar multi-gpu things. Paszke says it might be a bug in the wrapper, because this is what THCPAutoGPU is supposed to deal with.