Use aten's GRAIN_SIZE for TH Tensor ops (#28770)

peterbell10 · facebook-github-bot · commit fe8804695bd5 · 2019-10-31T07:18:46.000-07:00
Summary: Fixes #28198 in my tests on a 24 core AMD threadripper. Profiling the benchmark showed that most of the slowdown in #28198 was from `THFloatTensor_fill` not being distributed across threads. It internally uses `TH_TENSOR_APPLY_CONTIG` which is a thin wrapper around `at::parallel_for` and uses `TH_OMP_OVERHEAD_THRESHOLD` or 100,000 as the grain size. Here I've changed it to use `at::internal::GRAIN_SIZE` which is 32,768 so ~1/3 of the old value. I think it makes sense to unify these two values so any future tuning in `ATen` will apply to `TH` as well. It's not entirely clear to me what the "uncertain", "ordin" and "hyper" variants are meant to represent but I've kept them at roughly the same ratio to `TH_OMP_OVERHEAD_THRESHOLD` as before. Here are the timing results I get: | Version | Full iteration time | `index_select` | `mm` | `addmm` | |:----------:|---------------:|-------------:|---------:|---------:| | master | 3505.85 ms/it | 184.302 ms | 9.520 ms | 8.494 ms | | no scaling | 3453.18 ms/it | 184.456 ms | 5.810 ms | 5.069 ms | | this PR | 3453.23 ms/it | 184.526 ms | 5.824 ms | 5.202 ms | Pull Request resolved: #28770 Differential Revision: D18202646 Pulled By: ezyang fbshipit-source-id: ab30e5ef24e62213f9bd3abace5c6442c75c9854
diff --git a/aten/src/TH/generic/THTensorApply.hpp b/aten/src/TH/generic/THTensorApply.hpp
@@ -4,10 +4,10 @@
   #define NAN (nan(NULL))
 #endif
 
-#define HYPER_TH_OMP_OVERHEAD_THRESHOLD 2000
-#define ORDIN_TH_OMP_OVERHEAD_THRESHOLD 20000
-#define UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD 50000
-#define TH_OMP_OVERHEAD_THRESHOLD 100000
+#define HYPER_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 16)
+#define ORDIN_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 4)
+#define UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 2)
+#define TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE)
 
 #define TH_CHECK_SAME_SIZE(TENSOR1, TENSOR2) \
 { \