-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
I am sure there is some legacy knowledge here that I don't have. So hoping one of the maintainers can elaborate.
Question: Why is blas_thread_init() called via the constructor mechanism in driver/others/memory.c? While at the same time there is code inside exec_blas_async() to evaluate whether the server has been started and call blas_thread_init() if needed. If we always init at startup, is there a use-case scenario where the server is shutdown and then later restarted?
The concern I have is that blas_thread_init() allocates a default number of threads (ie. number of cpu cores) before any possible call to openblas_set_num_threads(). If I have a 64 core machine, then 64 threads are created even if the code intends to limit usage to only one thread/core. This requires the worker_thread code path to deal with an odd case where the actual number of worker threads exceeds the number of threads we ever intend to use.
What I'd like to do is avoid paying the cost of an initialization check on every submission of work if we can safely guarantee that the server is always initialized on startup. But obviously that is not possible if there are valid startup-shutdown-startup patterns.
Alternatively I'd prefer to kill any extra worker threads when an openblas_set_num_threads() call reduces the number of threads. More threads can be created as needed.
Any thoughts? I have an amended PR ready to go for the latter strategy in blas_server_win32.c