Skip to content

Silently in-place merge last row with on= when multiple rows match #3747

@renkun-ken

Description

@renkun-ken

data[dt, x := y, on = "group"] performs in-place adding columns from dt to data on corresponding group. Currently, if dt has multiple rows that match a certain group, it seems the last row in that group will be taken, silently. I'm not sure if it is a designed behavior, but for me such multiple matching in most cases is simply a mistake and I should take a closer look at why dt has multiple matches on group, which often suggests that I made a mistake creating dt or something is missing in dt.

A minimal example is following:

library(data.table)

d1 <- data.table(id = 1:10, x = 1:10)
d1
#>     id  x
#>  1:  1  1
#>  2:  2  2
#>  3:  3  3
#>  4:  4  4
#>  5:  5  5
#>  6:  6  6
#>  7:  7  7
#>  8:  8  8
#>  9:  9  9
#> 10: 10 10
d2 <- data.table(id = c(1, 1, 2, 3, 3), y = c(1, 2, 2, 3, 4))
d2
#>    id y
#> 1:  1 1
#> 2:  1 2
#> 3:  2 2
#> 4:  3 3
#> 5:  3 4
d1[d2, y := y, on = "id"]
d1
#>     id  x  y
#>  1:  1  1  2
#>  2:  2  2  2
#>  3:  3  3  4
#>  4:  4  4 NA
#>  5:  5  5 NA
#>  6:  6  6 NA
#>  7:  7  7 NA
#>  8:  8  8 NA
#>  9:  9  9 NA
#> 10: 10 10 NA

Would it make sense if a signal is generated for such case so that the problem can be spotted eariler?

Metadata

Metadata

Assignees

No one assigned

    Labels

    joinsUse label:"non-equi joins" for rolling, overlapping, and non-equi joins

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions