-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Description
In previous versions of data.table, dt1 <- dt[TRUE] creates a shallow copy of dt so that dt and dt1 have different memory addresses. It would be safe to add columns to dt1 without influencing dt with no cost of copying any column in it. This is particularly useful when dt is extremely large and different scripts need to use it to compute different columns without copying it.
library(data.table)
dt <- data.table(id = 1:10)
dt1 <- dt[TRUE]
dt1[, x := 1]
dt2 <- dt[TRUE]
dt2[, x := 2]
dt
#> id
#> 1: 1
#> 2: 2
#> 3: 3
#> 4: 4
#> 5: 5
#> 6: 6
#> 7: 7
#> 8: 8
#> 9: 9
#> 10: 10
dt1
#> id x
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
#> 6: 6 1
#> 7: 7 1
#> 8: 8 1
#> 9: 9 1
#> 10: 10 1
dt2
#> id x
#> 1: 1 2
#> 2: 2 2
#> 3: 3 2
#> 4: 4 2
#> 5: 5 2
#> 6: 6 2
#> 7: 7 2
#> 8: 8 2
#> 9: 9 2
#> 10: 10 2
address(dt)
#> [1] "0x7f8655ef4200"
address(dt1)
#> [1] "0x7f8655f40000"
address(dt2)
#> [1] "0x7f8655eae200"3937881 changes this behavior and dt[TRUE] will not shallow copy dt so that the following code does not work any more.
library(data.table)
dt <- data.table(id = 1:10)
dt1 <- dt[TRUE]
dt1[, x := 1]
dt2 <- dt[TRUE]
dt2[, x := 2]
dt
#> id x
#> 1: 1 2
#> 2: 2 2
#> 3: 3 2
#> 4: 4 2
#> 5: 5 2
#> 6: 6 2
#> 7: 7 2
#> 8: 8 2
#> 9: 9 2
#> 10: 10 2
dt1
#> id x
#> 1: 1 2
#> 2: 2 2
#> 3: 3 2
#> 4: 4 2
#> 5: 5 2
#> 6: 6 2
#> 7: 7 2
#> 8: 8 2
#> 9: 9 2
#> 10: 10 2
dt2
#> id x
#> 1: 1 2
#> 2: 2 2
#> 3: 3 2
#> 4: 4 2
#> 5: 5 2
#> 6: 6 2
#> 7: 7 2
#> 8: 8 2
#> 9: 9 2
#> 10: 10 2
address(dt)
#> [1] "0x7fb92930be00"
address(dt1)
#> [1] "0x7fb92930be00"
address(dt2)
#> [1] "0x7fb92930be00"Currently data.table:::shallow is not exported so there's no way to use public API to shallow copy a data.table without losing its key. (.subset(dt, ...) and then setDT will shallow copy dt but its key will lose).
jangorecki, mattdowle and shrektan