Skip to content

1.13.0 slow down in a repeated loop on list column #4658

@sandoronodi

Description

@sandoronodi

I have noticed a huge performance drop in data.table loop operations, possibly due to the new version upgrade

library(data.table)
library(microbenchmark)

dt <- data.table('id'=1:20000,
                 'list_col'=sample(c('', '', 'a', 'a:b', 'a:b:c'), 20000, TRUE))
feature <- 'list_col'

microbenchmark(
  long_dt <- dt[, c("id", feature), with = FALSE][
    , feature_names := {
      x <- get(feature)
      stringr::str_split(x, ':')
    }][
      , .(
        feature_names = paste0(feature, "_", unlist(feature_names))
      )
      , by = "id"]
  , times = 10
  , unit = 'ms'
)

data.table 1.12.8, default settings, using 6 threads:

      min       lq     mean   median       uq      max neval
 122.2447 149.6991 173.3268 183.5777 193.9876 201.7234    10

data.table 1.13.0, default settings, using 6 threads:

      min      lq     mean   median      uq      max neval
 12820.75 12913.1 12989.59 13007.94 13065.1 13097.85    10

Also, I have tried several different threads and throttle combinations, but have seen no improvements at all

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions