Skip to content

Don't create caffe2::pthreadpool() with getDefaultNumThreads()-many threads in set_num_threads(1) #134714

@quickbeam123

Description

@quickbeam123

🚀 The feature, motivation and pitch

I start my interaction with libtorch by calling at::set_num_threads(1);. What I observe (on a computer with many cores) is 60+ threads being created and immediately destroyed, which bumps my program's htop usage well above 100% and I find it inefficient (and silly).

I guess the problems is that caffe2::pthreadpool() always initializes its static pool (https://github.com/pytorch/pytorch/blob/44dadf25065c73bd1370258e7fb1b421cee4283a/caffe2/utils/threadpool/pthreadpool-cpp.cc#L90C35-L90C55) with getDefaultNumThreads() many threads first, before listening to set_thread_count (https://github.com/pytorch/pytorch/blob/aa31e7019a49e1d36b23a5132dc52f2414b65055/aten/src/ATen/ParallelNative.cpp#L234C9-L234C25) in set_num_threads.

This could be fixed by passing nthreads to caffe2::pthreadpool() and only resorting to getDefaultNumThreads() if no reasonable value is provided (like 0, which could be the default).

Alternatives

No response

Additional context

No response

cc @msaroufim @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Metadata

Metadata

Assignees

No one assigned

    Labels

    actionablemodule: cpuCPU specific problem (e.g., perf, algorithm)module: performanceIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions