[ROCm] converting to different device & dtype gives wrong results

## 🐛 Bug

On ROCm builds, converting a `float64` tensor of `gpu0` to a `float32` tensor on `gpu1` gives wrong results.

## To Reproduce
```py
    float64_cuda0 = torch.ones(10, device='cuda:0', dtype=torch.float64)
    float32_cuda1 = torch.zeros(10, device='cuda:1', dtype=torch.float32)
    print("float64_cuda0", float64_cuda0)
    print("float32_cuda1", float32_cuda1)
    print("float64_cuda0.to(float32_cuda1)", float64_cuda0.to(float32_cuda1))  # BUG!
    print("float64_cuda0.to(float32_cuda1.device)", float64_cuda0.to(float32_cuda1.device))
    print("float64_cuda0.to(float32_cuda1.dtype)", float64_cuda0.to(float32_cuda1.dtype))
    print("float64_cuda0.to(float32_cuda1.device, float32_cuda1.dtype)", float64_cuda0.to(float32_cuda1.device, float32_cuda1.dtype))  # BUG!
    print("float64_cuda0.to(float32_cuda1.device).to(float32_cuda1.dtype)", float64_cuda0.to(float32_cuda1.device).to(float32_cuda1.dtype))
    print("float64_cuda0.to(float32_cuda1.dtype).to(float32_cuda1.device)", float64_cuda0.to(float32_cuda1.dtype).to(float32_cuda1.device))
```
Output:
```
15:53:37 ('float64_cuda0', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0'))
15:53:37 ('float32_cuda1', tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:1',
15:53:37        dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1)', tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:1',
15:53:37        dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1.device)', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:1'))
15:53:37 ('float64_cuda0.to(float32_cuda1.dtype)', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0',
15:53:37        dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1.device, float32_cuda1.dtype)', tensor([0.0000, 1.8750, 0.0000, 1.8750, 0.0000, 1.8750, 0.0000, 1.8750, 0.0000,
15:53:37         1.8750], device='cuda:1', dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1.device).to(float32_cuda1.dtype)', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:1',
15:53:37        dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1.dtype).to(float32_cuda1.device)', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:1',
15:53:37        dtype=torch.float32))
```

## Additional context

I used this PR (https://github.com/pytorch/pytorch/pull/16431) to reproduce this bug. An example of the error log is at https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/9093/console

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] converting to different device & dtype gives wrong results #16448

🐛 Bug

To Reproduce

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ROCm] converting to different device & dtype gives wrong results #16448

Description

🐛 Bug

To Reproduce

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions