`on` performing slower than double `setkey`

I recently integrated the new `on` functionality into some code of mine that was being dragged down by repetitive key switching ([here](http://stackoverflow.com/questions/29176114/using-name-full-name-and-maiden-name-strings-and-birthdays-to-match-individual) for some context), so I was excited for the new `on` feature to (potentially) speed things up. I was quite surprised to find that actually the code ran about 30% slower (45 instead of 35 minutes) using `on`.

I was able to reproduce this using large `data.table`s beefed up from [@jangorecki's `join_on` tests](https://gist.github.com/jangorecki/22918c9bb23256b9ec44):

```
nn<-1e6
mm<-1e2

times=50L

set.seed(45L)
DT1 = data.table(x=sample(letters[1:3], nn, TRUE), y=sample(6:10, nn, TRUE), 
                 a=sample(100, nn,T), b=runif(nn))
DT2 = CJ(x=letters[1:3], y=6:10)[, mul := sample(20, 15)][sample(15L, mm,T)]

times2<-times1<-numeric(times)
for (ii in 1:times){
  cp1<-copy(DT1); cp2<-copy(DT2)
  strt<-get_nanotime()
  cp1[cp2,on="x",allow.cartesian=T]
  stp<-get_nanotime()
  times1[ii]<-stp-strt

  cp1<-copy(DT1); cp2<-copy(DT2)
  strt<-get_nanotime()
  setkey(cp1,x)[setkey(cp2,x),allow.cartesian=T]
  stp<-get_nanotime()
  times2[ii]<-stp-strt
}
> median(times1)/median(times2)
[1] 1.274535
```

So about 27% slower here. Maybe I'm not understanding the purpose of `on`, but I thought that the double-keyed approach should basically be an upper bound for how long `on` takes. And indeed `on` is faster when the tables are smaller:

```
nn<-1e3

> median(times1)/median(times2)
[1] 0.9491699
```

So, roughly 5% faster when `DT1` is smaller.

```
nn<-1e6; mm<-5
> median(times1)/median(times2)
[1] 0.9394226
```

Roughly 7% faster when `DT2` is smaller.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`on` performing slower than double `setkey` #1232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

on performing slower than double setkey #1232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`on` performing slower than double `setkey` #1232