Skip to content

Naming conflict and unexpected behavior with the functional form of data.table DT() #5129

@Kamgang-B

Description

@Kamgang-B

Issue 1: When working with a data.frame, assignment to a new variable does affect the original dataset while assignment to an existing variable (modification/update) does.

A = setDF(list(mpg = c(21, 21, 22.8, 21.4, 18.7), 
               cyl = c(6, 6, 4, 6, 8), 
               disp = c(160, 160, 108, 258, 360)))

# create a count column (output not shown)
A |> DT(, count := .N, by=cyl)   

print(A)  # does not contain the count column
#    mpg cyl disp
# 1 21.0   6  160
# 2 21.0   6  160
# 3 22.8   4  108
# 4 21.4   6  258
# 5 18.7   8  360

# modify an existing column
A |> DT(, disp := disp %% 100)

print(A) # disp has been modified
#    mpg cyl disp
# 1 21.0   6   60
# 2 21.0   6   60
# 3 22.8   4    8
# 4 21.4   6   58
# 5 18.7   8   60

I think that the expectation when calling a data.table query on a data.frame is that it should behave in a similar way; that is,
assignments and modifications should affect the original data.frame. This is partially useful to avoid to reassign the data back every time we use DT on a data.frame. This will also make it consistent with what would happen when using a data.table and not a data.frame.

Issue 2: Naming a data.frame or data.table D leads to errors when used with DT function:

D <- copy(A)

D[1:3,] |> DT(D[4:5,], on="cyl")             # error
Error: object of type 'closure' is not subsettable

A[1:3,] |> DT(A[4:5,], on="cyl")             # works

D |> DT(, names(D) := lapply(.SD, sort))      # error
Error: LHS of := isn't column names ('character') or positions ('integer' or 'numeric')

A |> DT(, names(A) := lapply(.SD, sort))      # works

Info session

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.1

loaded via a namespace (and not attached):
 [1] zoo_1.8-9         compiler_4.1.0    htmltools_0.5.1.1 tools_4.1.0       xts_0.12.1        yaml_2.2.1       
 [7] rmarkdown_2.10    grid_4.1.0        knitr_1.33        xfun_0.23         digest_0.6.27     rlang_0.4.11     
[13] lattice_0.20-44   evaluate_0.14    

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions