Skip to content

dt[TRUE] no longer performs shallow copy #3214

@renkun-ken

Description

@renkun-ken

In previous versions of data.table, dt1 <- dt[TRUE] creates a shallow copy of dt so that dt and dt1 have different memory addresses. It would be safe to add columns to dt1 without influencing dt with no cost of copying any column in it. This is particularly useful when dt is extremely large and different scripts need to use it to compute different columns without copying it.

library(data.table)

dt <- data.table(id = 1:10)
dt1 <- dt[TRUE]
dt1[, x := 1]
dt2 <- dt[TRUE]
dt2[, x := 2]
dt
#>     id
#>  1:  1
#>  2:  2
#>  3:  3
#>  4:  4
#>  5:  5
#>  6:  6
#>  7:  7
#>  8:  8
#>  9:  9
#> 10: 10
dt1
#>     id x
#>  1:  1 1
#>  2:  2 1
#>  3:  3 1
#>  4:  4 1
#>  5:  5 1
#>  6:  6 1
#>  7:  7 1
#>  8:  8 1
#>  9:  9 1
#> 10: 10 1
dt2
#>     id x
#>  1:  1 2
#>  2:  2 2
#>  3:  3 2
#>  4:  4 2
#>  5:  5 2
#>  6:  6 2
#>  7:  7 2
#>  8:  8 2
#>  9:  9 2
#> 10: 10 2

address(dt)
#> [1] "0x7f8655ef4200"
address(dt1)
#> [1] "0x7f8655f40000"
address(dt2)
#> [1] "0x7f8655eae200"

3937881 changes this behavior and dt[TRUE] will not shallow copy dt so that the following code does not work any more.

library(data.table)

dt <- data.table(id = 1:10)
dt1 <- dt[TRUE]
dt1[, x := 1]
dt2 <- dt[TRUE]
dt2[, x := 2]
dt
#>     id x
#>  1:  1 2
#>  2:  2 2
#>  3:  3 2
#>  4:  4 2
#>  5:  5 2
#>  6:  6 2
#>  7:  7 2
#>  8:  8 2
#>  9:  9 2
#> 10: 10 2
dt1
#>     id x
#>  1:  1 2
#>  2:  2 2
#>  3:  3 2
#>  4:  4 2
#>  5:  5 2
#>  6:  6 2
#>  7:  7 2
#>  8:  8 2
#>  9:  9 2
#> 10: 10 2
dt2
#>     id x
#>  1:  1 2
#>  2:  2 2
#>  3:  3 2
#>  4:  4 2
#>  5:  5 2
#>  6:  6 2
#>  7:  7 2
#>  8:  8 2
#>  9:  9 2
#> 10: 10 2

address(dt)
#> [1] "0x7fb92930be00"
address(dt1)
#> [1] "0x7fb92930be00"
address(dt2)
#> [1] "0x7fb92930be00"

Currently data.table:::shallow is not exported so there's no way to use public API to shallow copy a data.table without losing its key. (.subset(dt, ...) and then setDT will shallow copy dt but its key will lose).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions