Memrecycle Performance Regression 1.14.3

Identified by @tlapak in #5366, there is a performance regression in 14.3 introduced by #4491 @jangorecki . I'm reporting here so we can separate the Pandas rolling time performance comparison from the regression issue.

The regression was specifically introduced by this commit: https://github.com/Rdatatable/data.table/pull/4491/commits/e793f53466d99f86e70fc2611b708ae8c601a451 . Basically, `snprintf()` performs poorly in spite of the better looking code. See also https://stackoverflow.com/questions/64600389/why-is-calling-snprintf-so-slow.

For performance, Stackoverflow seems to suggest `strcpy()`. I propose either rolling back the changes or using a helper function during the calls so that the string is only made when needed. Without any changes, users would see performance issues which would be most noticeable when there are many groups in non-Gforce j calls.



```r 
set.seed(123L)
n = 1e4L
dt = data.table(id = seq_len(n),
                val = rnorm(n))

### 1.14.2
system.time(dt[, .(vs = (sum(val))), by = .(id)])

##    user  system elapsed 
##    0.04    0.00    0.05

### after commit e793f53
system.time(dt[, .(vs = (sum(val))), by = .(id)])

##    user  system elapsed 
##    1.22    0.03    1.23 
```

_Edit:_ The main difference is that every call gets this assigned `targetDesc`. Previously, the character would only be made during warnings, errors, or higher levels of verbosity with types being mismatched. That is, the character would largely not be made. If the second line with `snprintf()` is commented out, the code runs fast.

``` c
  static char targetDesc[501];  // from 1.14.1 coerceAs reuses memrecycle for a target vector, PR#4491
  snprintf(targetDesc, 500, colnum==0 ? _("target vector") : _("column %d named '%s'"), colnum, colname);
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memrecycle Performance Regression 1.14.3 #5371

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memrecycle Performance Regression 1.14.3 #5371

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions