-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🚀 The feature, motivation and pitch
I start my interaction with libtorch by calling at::set_num_threads(1);. What I observe (on a computer with many cores) is 60+ threads being created and immediately destroyed, which bumps my program's htop usage well above 100% and I find it inefficient (and silly).
I guess the problems is that caffe2::pthreadpool() always initializes its static pool (https://github.com/pytorch/pytorch/blob/44dadf25065c73bd1370258e7fb1b421cee4283a/caffe2/utils/threadpool/pthreadpool-cpp.cc#L90C35-L90C55) with getDefaultNumThreads() many threads first, before listening to set_thread_count (https://github.com/pytorch/pytorch/blob/aa31e7019a49e1d36b23a5132dc52f2414b65055/aten/src/ATen/ParallelNative.cpp#L234C9-L234C25) in set_num_threads.
This could be fixed by passing nthreads to caffe2::pthreadpool() and only resorting to getDefaultNumThreads() if no reasonable value is provided (like 0, which could be the default).
Alternatives
No response
Additional context
No response
cc @msaroufim @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10