Performance improvement on sparse CUDA coalesce()

`coalesce()` at CUDA is slower than at CPU:
```
>>> from random import *
>>> n = 100
>>> I = torch.tensor([[randint(0, 99) for _ in range(3)] for _ in range(n)])
>>> V = torch.randn(n)
>>> size = torch.Size([100, 100, 100])
>>> S = torch.sparse_coo_tensor(I.t(), V, size)

>>> %timeit S.coalesce()
1000 loops, best of 3: 255 µs per loop

>>> S = torch.sparse_coo_tensor(I.t(), V.cuda(), size)
>>> %timeit torch.cuda.synchronize(); S.coalesce(); torch.cuda.synchronize();
1000 loops, best of 3: 3.52 ms per loop
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance improvement on sparse CUDA coalesce() #10757

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance improvement on sparse CUDA coalesce() #10757

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions