Unnecessary cuda synchronizations that we should remove in PyTorch

### 🚀 The feature, motivation and pitch

There are a number of unnecessary cuda synchronizations in PyTorch ops, and I think we should endeavor to remove them whenever possible.
To check syncs, you can use `torch.cuda.set_sync_debug_mode("warn")`

I'm creating this issue to track ones that I've seen/found.

- [x] torch.multinomial with `num_samples=1`. For this I think we should simply remove the error check causing the sync, and ideally turn it into a cuda async error. https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/Distributions.cpp#L615
```
A = torch.rand(10)
torch.multinomial(A, num_samples=1)
```

- [ ] repeat_interleave with a tensor number of repeats encourages synchronization. We cannot use `repeats` with a non-cuda tensor, and that forces a synchronization. For this I think we should add a list of ints overload or allow passing a CPU tensor for repeats.
```
A = torch.randn(3, device='cuda')
num_repeats = torch.tensor([2, 3, 5])
out = torch.repeat_interleave(A, num_repeats.cuda(), dim=0)
```


- [ ] Indexing with a scalar tensor performs a synchronization. See https://github.com/pytorch/pytorch/issues/105641 for more details.

- [ ] `torch.normal` also incurs a sync on std: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/DistributionTemplates.h#L222

- [x] `nanmedian` incurs a sync: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/cuda/Sorting.cpp#L149

- [ ] prod_backward: https://github.com/pytorch/pytorch/issues/128396


### Alternatives

_No response_

### Additional context

_No response_

cc @ptrblck

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary cuda synchronizations that we should remove in PyTorch #108968

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unnecessary cuda synchronizations that we should remove in PyTorch #108968

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions