-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
IMHO, the typical use of the function %notin% is likely expected to be DT[lhs %notin% rhs), ...] where 1- rhs contains no missing value and 2- the user wants to return/modify rows where lhs contains only values in rhs.
Also, I don't expect users to do something like !lhs %notin% (since %in% is already convenient for this operation).
For these reasons, I think that it is better to be on the safe side by allowing DT[lhs %notin% rhs,...], to return/modify only rows whose values are in rhs. In doing so, the user will have to explicitly add NA to the rhs if he also wants to include rows with missing values.
Consider the following example:
dt = data.table(x=c(1:3, NA, 4L, NA), y=1:6, z=10*c(3, 1, 4, 8, 3, 8))
x y z
<int> <int> <num>
1: 1 1 30
2: 2 2 10
3: 3 3 40
4: NA 4 80
5: 4 5 30
6: NA 6 80
dt[x %notin% 1:3, y := z]
x y z
<int> <int> <num>
1: 1 1 30
2: 2 2 10
3: 3 3 40
4: NA 80 80
5: 4 30 30
6: NA 80 80
In doing this operation, I don't really think users expect the rows where x is NA to be modified.
So, even if %notin% is meant to provide a more memory-efficient version of !lhs %in% rhs% (IIRW), I also think that it would better to handle missing values more safely.
P.S.: I wonder if it's also possible to export a functional alternative of %notin%. something like notin(x, table, nomatch=-1L).