Skip to content

in some cases setkey could save some more sort operations #1321

@jan-glx

Description

@jan-glx

When setkey is called on a DT with sorted attribute, existing keys should be reused if possible.
Most importantly in the setkey(setkey(dt, x, y), x) case but also in a little more advanced cases like setkey(setkey(dt, x, y), y, x). If you per se do not want to trust the sorted attribute at least check for the column being sorted before sorting if it is marked as sorted.
-best, Jan
example:

dtu = data.table(x = sample(1E6, 1E8, replace=T), 
                 y = sample(1E6, 1E8, replace=T))
dts = setkey(copy(dtu),x,y)
onUnsorted <- function() setkey(dtu,y,x)
onSorted <- function() setkey(dts,y,x)
onSortedSmart <- function() setattr(setkey(dts,y),"sorted",c("y","x"))

identical(onUnsorted(), onSorted())
#[1] TRUE
identical(onSorted(), onSortedSmart())
#[1] TRUE

system.time(onUnsorted())
#   user  system elapsed 
#   0.47    0.10    0.56 
system.time(onSorted())
#   user  system elapsed 
#   0.50    0.07    0.58 
system.time(onSortedSmart())
#   user  system elapsed 
#   0.24    0.06    0.29 

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions