Skip to content

groupby with dogroups (R expression) performance regression #4200

@jangorecki

Description

@jangorecki

There is a performance regression (AFAIU) when doing by group computation where we run R's C eval by each group (q7 and q8 in db-benchmark).

    in_rows question_group                    question 20181206_da98fb2 20190913_35b0de3 20191115_92abb70 20191205_eba8704 20191209_6808d2c 20191212_e0140ea 20191229_d52b0d8 20200124_c005296
 1:     1e9          basic               sum v1 by id1           21.144           10.026            8.375            9.362            9.060            9.082            9.366            9.271
 2:     1e9          basic           sum v1 by id1:id2           38.914           11.746            9.243            9.327            9.331            9.978           10.813            9.220
 3:     1e9          basic       sum v1 mean v3 by id3           99.517           14.487           12.044           12.291           13.496           14.325           13.191           13.169
 4:     1e9          basic           mean v1:v3 by id4           26.593           17.357           15.135           15.157           15.278           16.761           16.724           16.754
 5:     1e9          basic            sum v1:v3 by id6          122.214           14.569           13.454           14.035           14.046           14.842           14.400           15.019
 6:     1e9       advanced  median v3 sd v3 by id4 id5               NA               NA          121.742          110.925          106.340          113.837          111.984          123.411
 7:     1e9       advanced      max v1 - min v2 by id3               NA               NA           98.680           91.596           87.005           93.749           91.294          299.863
 8:     1e9       advanced       largest two v3 by id6               NA               NA          234.926          215.574          213.824          216.152          211.295          411.241
 9:     1e9       advanced regression v1 v2 by id2 id4               NA           72.466           81.297           76.121           75.311           74.769           72.571           41.157
10:     1e9       advanced     sum v3 count by id1:id6               NA          180.257          196.403          187.816          177.257          187.702          190.282          187.403

worth to note that at the same time q9 x[, .(r2=cor(v1, v2)^2), by=.(id2, id4)] got nice speed up

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions