Skip to content

Commit 49082f9

Browse files
Ryo-not-riopytorchmergebot
authored andcommitted
parallelize sort (#142391)
- use __gnu_parallel::sort for gcc compilations - add a parallelized version of std::sort and std::stable_sort for non gcc compilations Using __gnu_parallel::sort: provides ~3.7x speed up for length 50000 sorts with NUM_THREADS=16 and NUM_THREADS=4 on aarch64 The performance is measured using the following script: ```python import torch import torch.autograd.profiler as profiler torch.manual_seed(0) N = 50000 x = torch.randn(N, dtype=torch.float) with profiler.profile(with_stack=True, profile_memory=False, record_shapes=True) as prof: for i in range(1000): _, _ = torch.sort(x) print(prof.key_averages(group_by_input_shape=True).table(sort_by='self_cpu_time_total', row_limit=10)) ``` Pull Request resolved: #142391 Approved by: https://github.com/malfet
1 parent 7725d0b commit 49082f9

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

cmake/Codegen.cmake

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -389,6 +389,12 @@ if(INTERN_BUILD_ATEN_OPS)
389389
else(MSVC)
390390
set(EXTRA_FLAGS "-DCPU_CAPABILITY=${CPU_CAPABILITY} -DCPU_CAPABILITY_${CPU_CAPABILITY}")
391391
endif(MSVC)
392+
393+
# Only parallelize the SortingKernel for now to avoid side effects
394+
if(${NAME} STREQUAL "native/cpu/SortingKernel.cpp" AND NOT MSVC AND USE_OMP)
395+
string(APPEND EXTRA_FLAGS " -D_GLIBCXX_PARALLEL")
396+
endif()
397+
392398
# Disable certain warnings for GCC-9.X
393399
if(CMAKE_COMPILER_IS_GNUCXX)
394400
if(("${NAME}" STREQUAL "native/cpu/GridSamplerKernel.cpp") AND ("${CPU_CAPABILITY}" STREQUAL "DEFAULT"))

0 commit comments

Comments
 (0)