-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
Description
allIterations <- data.frame(v1 = runif(1e5), v2 = runif(1e5))
DoSomething <- function(row) {
someCalculation <- row[["v1"]] + 1
}
system.time(
{
for (r in 1:nrow(allIterations)) {
DoSomething(allIterations[r, ])
}
}
)
## user system elapsed
## 4.50 0.02 4.55
allIterations <- as.data.table(allIterations)
system.time(
{
for (r in 1:nrow(allIterations)) {
DoSomething(allIterations[r, ])
}
}
)
## user system elapsed
## 53.78 25.05 78.46 I'm working on a R project that involves applying fairly complicated functions across data.table or data.frame by rows.
In cases where vectorizing is not a good option, one might need to loop through rows, and that's when I realized selecting by row number from a data.table is actually much slower than from a data.frame.
I guess selecting by row number is not a recommended practice for data.table? Or would the team be interested in looking into this and optimize the performance?
I have more details about my test here.