I wonder if it's technically possible to do multi-threaded group-by? I tested some multi-threading group by in Julia using a divide and conquer algorithm and I can make sum-by faster. So if data.table has multi-threaded group-by then things should speed up even more