Commit 49082f9
parallelize sort (#142391)
- use __gnu_parallel::sort for gcc compilations
- add a parallelized version of std::sort and std::stable_sort for non gcc compilations
Using __gnu_parallel::sort:
provides ~3.7x speed up for length 50000 sorts with NUM_THREADS=16 and NUM_THREADS=4 on aarch64
The performance is measured using the following script:
```python
import torch
import torch.autograd.profiler as profiler
torch.manual_seed(0)
N = 50000
x = torch.randn(N, dtype=torch.float)
with profiler.profile(with_stack=True, profile_memory=False, record_shapes=True) as prof:
for i in range(1000):
_, _ = torch.sort(x)
print(prof.key_averages(group_by_input_shape=True).table(sort_by='self_cpu_time_total', row_limit=10))
```
Pull Request resolved: #142391
Approved by: https://github.com/malfet1 parent 7725d0b commit 49082f9
1 file changed
+6
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
389 | 389 | | |
390 | 390 | | |
391 | 391 | | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
392 | 398 | | |
393 | 399 | | |
394 | 400 | | |
| |||
0 commit comments