Skip to content

Commit fe88046

Browse files
peterbell10facebook-github-bot
authored andcommitted
Use aten's GRAIN_SIZE for TH Tensor ops (#28770)
Summary: Fixes #28198 in my tests on a 24 core AMD threadripper. Profiling the benchmark showed that most of the slowdown in #28198 was from `THFloatTensor_fill` not being distributed across threads. It internally uses `TH_TENSOR_APPLY_CONTIG` which is a thin wrapper around `at::parallel_for` and uses `TH_OMP_OVERHEAD_THRESHOLD` or 100,000 as the grain size. Here I've changed it to use `at::internal::GRAIN_SIZE` which is 32,768 so ~1/3 of the old value. I think it makes sense to unify these two values so any future tuning in `ATen` will apply to `TH` as well. It's not entirely clear to me what the "uncertain", "ordin" and "hyper" variants are meant to represent but I've kept them at roughly the same ratio to `TH_OMP_OVERHEAD_THRESHOLD` as before. Here are the timing results I get: | Version | Full iteration time | `index_select` | `mm` | `addmm` | |:----------:|---------------:|-------------:|---------:|---------:| | master | 3505.85 ms/it | 184.302 ms | 9.520 ms | 8.494 ms | | no scaling | 3453.18 ms/it | 184.456 ms | 5.810 ms | 5.069 ms | | this PR | 3453.23 ms/it | 184.526 ms | 5.824 ms | 5.202 ms | Pull Request resolved: #28770 Differential Revision: D18202646 Pulled By: ezyang fbshipit-source-id: ab30e5ef24e62213f9bd3abace5c6442c75c9854
1 parent 9630b78 commit fe88046

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

aten/src/TH/generic/THTensorApply.hpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
#define NAN (nan(NULL))
55
#endif
66

7-
#define HYPER_TH_OMP_OVERHEAD_THRESHOLD 2000
8-
#define ORDIN_TH_OMP_OVERHEAD_THRESHOLD 20000
9-
#define UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD 50000
10-
#define TH_OMP_OVERHEAD_THRESHOLD 100000
7+
#define HYPER_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 16)
8+
#define ORDIN_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 4)
9+
#define UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 2)
10+
#define TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE)
1111

1212
#define TH_CHECK_SAME_SIZE(TENSOR1, TENSOR2) \
1313
{ \

0 commit comments

Comments
 (0)