-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Description
Maybe this is a known issue, but to me it came as a very bad surprise.
During a join, if i is coerced to integer to match x's column type, 1.5 is joined to 1:
library(data.table)
dt1 <- data.table(x = 1L, y = 1)
dt2 <- data.table(x = 1.5, z = 2)
dt1[dt2, on = "x", verbose = TRUE]
# Calculated ad hoc index in 0 secs
# Coercing double column i.'x' to integer to match type of x.'x'. Please avoid coercion for efficiency.
# Starting bmerge ...done in 0 secs
# x y z
# 1: 1 1 2
# 1: 1 1 2
merge(dt2, dt1, by = "x")
# x y z
# 1: 1 1 2
The reason is the blind coercion to integer of dt2$x during bmerge.
We should adopt the base R loic of merge.data.frame where the join columns get coerced to the 'highest' involved type (https://github.com/wch/r-source/blob/e690b0d6998dfbc360f0fa14492eb8648df20949/src/main/unique.c) Lines 902ff:
/* Coerce to a common type; type == NILSXP is ok here.
* Note that above we coerce factors and "POSIXlt", only to character.
* Hence, coerce to character or to `higher' type
* (given that we have "Vector" or NULL) */
if(TYPEOF(x) >= STRSXP || TYPEOF(table) >= STRSXP) type = STRSXP;
else type = TYPEOF(x) < TYPEOF(table) ? TYPEOF(table) : TYPEOF(x);
PROTECT(x = coerceVector(x, type)); nprot++;
PROTECT(table = coerceVector(table, type)); nprot++;
renkun-ken, AkhilChilakala, MichaelChirico, mattdowle and elgalu