Skip to content

joins with type coercion: 1.5 == 1 is TRUE #2592

@MarkusBonsch

Description

@MarkusBonsch

Maybe this is a known issue, but to me it came as a very bad surprise.
During a join, if i is coerced to integer to match x's column type, 1.5 is joined to 1:

library(data.table)
dt1 <- data.table(x = 1L, y = 1)
dt2 <- data.table(x = 1.5, z = 2)
dt1[dt2, on = "x", verbose = TRUE]
# Calculated ad hoc index in 0 secs
# Coercing double column i.'x' to integer to match type of x.'x'. Please avoid coercion for efficiency.
# Starting bmerge ...done in 0 secs
#     x y z
# 1: 1 1 2
# 1: 1 1 2

merge(dt2, dt1, by = "x")
#    x y z
# 1: 1 1 2

The reason is the blind coercion to integer of dt2$x during bmerge.
We should adopt the base R loic of merge.data.frame where the join columns get coerced to the 'highest' involved type (https://github.com/wch/r-source/blob/e690b0d6998dfbc360f0fa14492eb8648df20949/src/main/unique.c) Lines 902ff:

    /* Coerce to a common type; type == NILSXP is ok here.
     * Note that above we coerce factors and "POSIXlt", only to character.
     * Hence, coerce to character or to `higher' type
     * (given that we have "Vector" or NULL) */
    if(TYPEOF(x) >= STRSXP || TYPEOF(table) >= STRSXP) type = STRSXP;
    else type = TYPEOF(x) < TYPEOF(table) ? TYPEOF(table) : TYPEOF(x);
    PROTECT(x	  = coerceVector(x,	type)); nprot++;
PROTECT(table = coerceVector(table, type)); nprot++;

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions