Skip to content

[ROCm] converting to different device & dtype gives wrong results #16448

@ssnl

Description

@ssnl

🐛 Bug

On ROCm builds, converting a float64 tensor of gpu0 to a float32 tensor on gpu1 gives wrong results.

To Reproduce

    float64_cuda0 = torch.ones(10, device='cuda:0', dtype=torch.float64)
    float32_cuda1 = torch.zeros(10, device='cuda:1', dtype=torch.float32)
    print("float64_cuda0", float64_cuda0)
    print("float32_cuda1", float32_cuda1)
    print("float64_cuda0.to(float32_cuda1)", float64_cuda0.to(float32_cuda1))  # BUG!
    print("float64_cuda0.to(float32_cuda1.device)", float64_cuda0.to(float32_cuda1.device))
    print("float64_cuda0.to(float32_cuda1.dtype)", float64_cuda0.to(float32_cuda1.dtype))
    print("float64_cuda0.to(float32_cuda1.device, float32_cuda1.dtype)", float64_cuda0.to(float32_cuda1.device, float32_cuda1.dtype))  # BUG!
    print("float64_cuda0.to(float32_cuda1.device).to(float32_cuda1.dtype)", float64_cuda0.to(float32_cuda1.device).to(float32_cuda1.dtype))
    print("float64_cuda0.to(float32_cuda1.dtype).to(float32_cuda1.device)", float64_cuda0.to(float32_cuda1.dtype).to(float32_cuda1.device))

Output:

15:53:37 ('float64_cuda0', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0'))
15:53:37 ('float32_cuda1', tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:1',
15:53:37        dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1)', tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:1',
15:53:37        dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1.device)', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:1'))
15:53:37 ('float64_cuda0.to(float32_cuda1.dtype)', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0',
15:53:37        dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1.device, float32_cuda1.dtype)', tensor([0.0000, 1.8750, 0.0000, 1.8750, 0.0000, 1.8750, 0.0000, 1.8750, 0.0000,
15:53:37         1.8750], device='cuda:1', dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1.device).to(float32_cuda1.dtype)', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:1',
15:53:37        dtype=torch.float32))
15:53:37 ('float64_cuda0.to(float32_cuda1.dtype).to(float32_cuda1.device)', tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:1',
15:53:37        dtype=torch.float32))

Additional context

I used this PR (#16431) to reproduce this bug. An example of the error log is at https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/9093/console

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: rocmAMD GPU support for Pytorch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions