Skip to content

Add syntax for "subsetting join" #2158

@franknarf1

Description

@franknarf1

With X[!Y], we can subset to where X has rows not matching Y, but there is no analogue for subsetting to where X does match Y. A X[Y, nomatch=0] join will sort the result according to Y and recognize dupe rows in Y, so I need to do something like X[sort(unique(X[Y, nomatch=0, which=TRUE]))] instead (unless I'm forgetting some other way).

library(data.table)
X = data.table(id = c(3L, 1L, 2L, 1L, 1L), g = c("A", "A", "B", "B", "A"), v = (1:5)*10)
Y = data.table(id = c(1L, 1:3), g = "A")

X[Y, on=names(Y), nomatch=0] 
# gives id = 1 before 3, contrary to X's ordering
# gives id = 1 twice, reflecting Y, but the goal is to subset X

#    id g  v
# 1:  1 A 20
# 2:  1 A 50
# 3:  1 A 20
# 4:  1 A 50
# 5:  3 A 10


X[ sort(unique(X[Y, on=names(Y), nomatch=0, which=TRUE])) ]
# desired result

#    id g  v
# 1:  3 A 10
# 2:  1 A 20
# 3:  1 A 50


X[!Y, on=names(Y)]
# analogous much simpler code for not join

#    id g  v
# 1:  2 B 30
# 2:  1 B 40

So I'm looking for new syntax to make this less awkward, maybe something like

X[Y, on=names(Y), subset.join = TRUE]
# or
X[subset.join(Y), on=names(Y)]

Metadata

Metadata

Assignees

No one assigned

    Labels

    joinsUse label:"non-equi joins" for rolling, overlapping, and non-equi joins

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions