Skip to content

Performance improvement on sparse CUDA coalesce() #10757

@weiyangfb

Description

@weiyangfb

coalesce() at CUDA is slower than at CPU:

>>> from random import *
>>> n = 100
>>> I = torch.tensor([[randint(0, 99) for _ in range(3)] for _ in range(n)])
>>> V = torch.randn(n)
>>> size = torch.Size([100, 100, 100])
>>> S = torch.sparse_coo_tensor(I.t(), V, size)

>>> %timeit S.coalesce()
1000 loops, best of 3: 255 µs per loop

>>> S = torch.sparse_coo_tensor(I.t(), V.cuda(), size)
>>> %timeit torch.cuda.synchronize(); S.coalesce(); torch.cuda.synchronize();
1000 loops, best of 3: 3.52 ms per loop

Metadata

Metadata

Assignees

Labels

todoNot as important as medium or high priority tasks, but we will work on these.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions