Skip to content

Properly handling the "variable" column after melting unbalanced wide data #2575

@mrdwab

Description

@mrdwab

In reference to my comments at #2551, using patterns with melt on unbalanced wide or panel data results in an incorrect molten dataset.

Here's a minimal example, where "a" values are missing "beta" times, and "b" values are missing "alpha" times:

library(data.table)
set.seed(2334)
DT <- data.table(
  a_alpha = rnorm(3), a_gamma = rnorm(3),
  b_beta = rnorm(3), b_gamma = rnorm(3),
  id = c(1:3)
)

melt just sees that there are two sets of variables:

melt(DT, measure.vars = patterns("a_", "b_"))
#    id variable     value1       value2
# 1:  1        1 -0.1183107  1.682018318
# 2:  2        1  1.2370906 -0.573611132
# 3:  3        1  0.8088209 -0.057320032
# 4:  1        2 -0.7656264 -0.706428227
# 5:  2        2 -0.5919939  0.001899857
# 6:  3        2  0.5279071  1.063851211

Expected behaviour:

melt(DT, "id")[, c("stub", "var") := tstrsplit(variable, "_")][, dcast(.SD, id + var ~ stub)]
#    id   var          a            b
# 1:  1 alpha -0.1183107           NA
# 2:  1  beta         NA  1.682018318
# 3:  1 gamma -0.7656264 -0.706428227
# 4:  2 alpha  1.2370906           NA
# 5:  2  beta         NA -0.573611132
# 6:  2 gamma -0.5919939  0.001899857
# 7:  3 alpha  0.8088209           NA
# 8:  3  beta         NA -0.057320032
# 9:  3 gamma  0.5279071  1.063851211

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions