Skip to content

Performance drop with 1.12.0 (Selection + assignment) #3395

@JeremyBesson

Description

@JeremyBesson

I'm not sure if it's the same issue as #3330 because I haven't set the DTthreads parameters to 1.
But I could imagine that the default value is 1.
This list of operations take less than 1 second to be executed with data.table v1.11.8.
With data.table v1.12.0 it's take more than 4 seconds !

>   myDataTable[colJ != "production", colK:=na.omit(colK), by=.(colA)]
>   myDataTable[colJ != "production", colH:=na.omit(colH), by=.(colA)]
>   myDataTable[colJ != "production", colL:=na.omit(colL), by=.(colA)]
>   myDataTable[colJ == "code",colM:=colE, by=.(colA)]
>   myDataTable[,colM:=na.omit(colM), by=.(colA)]
>   myDataTable[is.na(colM), colM:=""]
>   myDataTable[colJ == "test",colN:=colE, by=.(colA)]
>   myDataTable[,colN:=na.omit(colN), by=.(colA)]

And there is only 4 rows in my data table :

print(myDataTable)

     colA colB colC colD colE colF colG colH colI colJ colK colL colM colN
1 	 text text text text text text text text text text text text text text
2 	 text text text text text text text text text text text text text text
3 	 text text text text text text text text text text text text text text
4  	 text text text text text text text text text text text text text text

(You can replace "text" by whatever)

We see this problem only in our production environment and not in other environment. This production server is really busy so it can explain why it's so long. But the test with v1.11.8 was done in the same environment without performance issue (We done this test many times with both version).

We run R in a Docker container with the system Ubuntu 16.04.

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.4

I hope this will be fixed in a future data.table release because we will be blocked in v1.11.8 until this issue is not resolved.
Thank you for your support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions