-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
GForceissues relating to optimized grouping calculations (GForce)issues relating to optimized grouping calculations (GForce)Highdev
Milestone
Description
Hi,
I'm using the dev version :
> data.table::update.dev.pkg()
R data.table package is up-to-date at eed712ef45fd9198de6aa1ac1b672a7347253d18 (1.14.3)
because I can't wait for some of the new features (especially optimized shift by and the great new env argument !!).
I met a problematic behaviour of := combined with by :
> dt <- data.table(by1 = c("a","a","b","b"), by2 = c("c","d","c","d"), value=c("ac","ad","bc","bd"))
> dt[,same_value:=value[1], .(by1, by2)][]
Argument 'by' after substitute: .(by1, by2)
Detected that j uses these columns: [value]
Finding groups using forderv ... forder.c received 4 rows and 2 columns
0.000s elapsed (0.000s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'value[1]'
GForce optimized j to '`g[`(value, 1)'
Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.000
gforce assign high and low took 0.000
gforce eval took 0.000
0.000s elapsed (0.000s cpu)
Assigning to all 4 rows
RHS_list_of_columns == false
RHS for item 1 has been duplicated because NAMED==2 MAYBE_SHARED==1, but then is being plonked. length(values)==4; length(cols)==1)
by1 by2 value same_value
<char> <char> <char> <char>
1: a c ac ac
2: a d ad ad
3: b c bc bc
4: b d bd bd
> dt[,same_value:=value[1], .(by2, by1)][]
Argument 'by' after substitute: .(by2, by1)
Detected that j uses these columns: [same_value, value]
Finding groups using forderv ... forder.c received 4 rows and 2 columns
0.000s elapsed (0.000s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
Getting back original order ... forder.c received a vector type 'integer' length 4
0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'value[1]'
GForce optimized j to '`g[`(value, 1)'
Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.000
gforce assign high and low took 0.001
gforce eval took 0.000
0.000s elapsed (0.000s cpu)
Assigning to 4 row subset of 4 rows
RHS_list_of_columns == false
by1 by2 value same_value
<char> <char> <char> <char>
1: a c ac ac
2: a d ad bc
3: b c bc ad
4: b d bd bd
Clearly value is expected to be equal to same_value whatever the order of the by arguments. It works normally when I use the 1.14.2 version.
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.3
loaded via a namespace (and not attached):
[1] compiler_4.1.0 tools_4.1.0
mattdowle
Metadata
Metadata
Assignees
Labels
GForceissues relating to optimized grouping calculations (GForce)issues relating to optimized grouping calculations (GForce)Highdev