-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
Milestone
Description
I have noticed a huge performance drop in data.table loop operations, possibly due to the new version upgrade
library(data.table)
library(microbenchmark)
dt <- data.table('id'=1:20000,
'list_col'=sample(c('', '', 'a', 'a:b', 'a:b:c'), 20000, TRUE))
feature <- 'list_col'
microbenchmark(
long_dt <- dt[, c("id", feature), with = FALSE][
, feature_names := {
x <- get(feature)
stringr::str_split(x, ':')
}][
, .(
feature_names = paste0(feature, "_", unlist(feature_names))
)
, by = "id"]
, times = 10
, unit = 'ms'
)
data.table 1.12.8, default settings, using 6 threads:
min lq mean median uq max neval
122.2447 149.6991 173.3268 183.5777 193.9876 201.7234 10
data.table 1.13.0, default settings, using 6 threads:
min lq mean median uq max neval
12820.75 12913.1 12989.59 13007.94 13065.1 13097.85 10
Also, I have tried several different threads and throttle combinations, but have seen no improvements at all
mattdowle