-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
GForceissues relating to optimized grouping calculations (GForce)issues relating to optimized grouping calculations (GForce)dev
Milestone
Description
I don't totally understand whats going on, but it seems that sort order is affecting the by calculation (sum, which I assume is being gforce optimized). Also, it seems like the answers might just be wrong (e.g. there should only be one unique value per id2).
library('data.table')
set.seed(10)
n = 100000
a = data.table(id1 = 1:n, id2 = sample(1:900,n,replace = T), flag = sample(c(0,0,0,1),n, replace = T))
b = copy(a)
#shuffle
a = a[sample(seq_len(nrow(a)), nrow(a))]
a[, t1 := sum(flag, na.rm = T), id2]
setorder(a,id1)
a[, t2 := sum(flag, na.rm = T), id2]
any(a[,t1!=t2])
#> [1] TRUE
any(a[, length(unique(t1))>1, id2]$V1)
#> [1] TRUE
any(a[, length(unique(t2))>1, id2]$V1)
#> [1] TRUE
#Without using gforce optimization
a = copy(b)
#shuffle
sum2 = sum
a = a[sample(seq_len(nrow(a)), nrow(a))]
a[, t1 := sum2(flag, na.rm = T), id2]
setorder(a,id1)
a[, t2 := sum2(flag, na.rm = T), id2]
any(a[,t1!=t2])
#> [1] FALSE
any(a[, length(unique(t1))>1, id2]$V1)
#> [1] FALSE
any(a[, length(unique(t2))>1, id2]$V1)
#> [1] FALSESession info:
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.3
loaded via a namespace (and not attached):
[1] rstudioapi_0.13 knitr_1.34 magrittr_2.0.1 R6_2.5.1 rlang_0.4.11 fastmap_1.1.0
[7] fansi_0.4.2 highr_0.9 tools_4.1.2 xfun_0.23 utf8_1.2.1 cli_3.1.0
[13] clipr_0.7.1 withr_2.4.2 htmltools_0.5.2 ellipsis_0.3.2 yaml_2.2.1 digest_0.6.27
[19] tibble_3.1.2 lifecycle_1.0.0 crayon_1.4.1 processx_3.5.2 callr_3.7.0 ps_1.6.0
[25] vctrs_0.3.8 fs_1.5.0 glue_1.4.2 evaluate_0.14 rmarkdown_2.11 reprex_2.0.1
[31] compiler_4.1.2 pillar_1.6.1 pkgconfig_2.0.3
MichaelChirico, ben-schwen and mattdowle
Metadata
Metadata
Assignees
Labels
GForceissues relating to optimized grouping calculations (GForce)issues relating to optimized grouping calculations (GForce)dev