Skip to content

Better use number of physical cores instead of logical cores by default #3298

@renkun-ken

Description

@renkun-ken

It is great that the latest release (v1.12.0) add more support for multithreading in subsetting and restore-after-fork behavior by default. The current default threads is omp_get_max_threads() which tries to achieve maximal performance on a completely task-free server. In practice, however, this default number of threads makes data.table multithreading cost much much more time in computing on a slightly busier server.

In my case, I'm using a server of 40 physical cores (RhpcBLASctl::get_num_cores()) and 80 threads (RhpcBLASctl::get_num_procs()). In status of no other running tasks, the default behavior of data.table is to occupy all 80 threads which works well. But if any other running tasks occupies a small number of threads (e.g. 5-10), data.table multithreading would cost much longer time than no-multithreading at all since those CPUs are blocked.

I'm not sure if it makes sense not to occupy all threads by default. I'd recommend occupying all physical cores by default which seems more practical and may achieve better performance when other tasks are using CPUs or hyperthreading is enabled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions