Skip to content

Commit ebc5bc3

Browse files
authored
fix variable with melt(measure.vars=list), na.rm=T/F consistency (#4723)
1 parent 7b4bd7d commit ebc5bc3

File tree

4 files changed

+11
-15
lines changed

4 files changed

+11
-15
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@
8080
8181
12. `as.data.table(table(NULL))` now returns `data.table(NULL)` rather than error `attempt to set an attribute on NULL`, [#4179](https://github.com/Rdatatable/data.table/issues/4179). The result differs slightly to `as.data.frame(table(NULL))` (0-row, 1-column) because 0-column works better with other `data.table` functions like `rbindlist()`. Thanks to Michael Chirico for the report and fix.
8282
83+
13. `melt` with a list for `measure.vars` would output `variable` inconsistently between `na.rm=TRUE` and `FALSE`, [#4455](https://github.com/Rdatatable/data.table/issues/4455). Thanks to @tdhock for reporting and fixing.
84+
8385
## NOTES
8486
8587
1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :

inst/tests/tests.Rraw

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3014,6 +3014,9 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x))
30143014
error="Unknown 'id.vars' type raw")
30153015
test(1035.012, melt(DT, id.vars=1:3, measure.vars=as.raw(0)),
30163016
error="Unknown 'measure.vars' type raw")
3017+
test(1035.013, melt(data.table(a=1, b=1), id.vars=c(1,1)), data.table(a=1, a.1=1, variable=factor("b"), value=1))
3018+
test(1035.014, melt(data.table(a1=1, b1=1, b2=2), na.rm=TRUE, measure.vars=list(a="a1", b=c("b1","b2"))), data.table(variable=factor(1,c("1","2")), a=1, b=1))
3019+
test(1035.015, melt(data.table(a=1+2i, b=1), id.vars="a"), error="Unknown column type 'complex' for column 'a' in 'data'")
30173020

30183021
ans1 = cbind(DT[, c(1,2,8), with=FALSE], variable=factor("l_1"))
30193022
ans1[, value := DT$l_1]
@@ -3175,9 +3178,9 @@ Sep,33.5,19.4,15.7,11.9,0,100.8,100.8,0,12.7,12.7,0,174.1")
31753178
x[, c("y1","z1"):=NA]
31763179
test(1037.405, dim(melt(x, measure.vars=patterns("^y", "^z"))), INT(4,5))
31773180
test(1037.406, dim(ans<-melt(x, measure.vars=patterns("^y", "^z"), na.rm=TRUE)), INT(2,5))
3178-
test(1037.407, ans$variable, factor(c("1","1")))
3181+
test(1037.407, ans$variable, factor(c("2","2"), c("1", "2")))
31793182
test(1037.408, dim(ans<-melt(x, measure.vars=patterns("^y", "^z"), na.rm=TRUE, variable.factor=FALSE)), INT(2,5))
3180-
test(1037.409, ans$variable, c("1","1"))
3183+
test(1037.409, ans$variable, c("2","2"))
31813184

31823185
test(1037.410, melt(data.table(NULL), verbose=TRUE), data.table(NULL),
31833186
output="ncol(data) is 0. Nothing to melt")

man/melt.data.table.Rd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ non-measure columns will be assigned to it. If integer, must be positive; see De
3131
}
3232
3333
For convenience/clarity in the case of multiple \code{melt}ed columns, resulting column names can be supplied as names to the elements \code{measure.vars} (in the \code{list} and \code{patterns} usages). See also \code{Examples}. }
34-
\item{variable.name}{name for the measured variable names column. The default name is \code{'variable'}.}
34+
\item{variable.name}{name (default \code{'variable'}) of output column containing information about which input column(s) were melted. If \code{measure.vars} is an integer/character vector, then each entry of this column contains the name of a melted column from \code{data}. If \code{measure.vars} is a list of integer/character vectors, then each entry of this column contains an integer indicating an index/position in each of those vectors.}
3535
\item{value.name}{name for the molten data values column(s). The default name is \code{'value'}. Multiple names can be provided here for the case when \code{measure.vars} is a \code{list}, though note well that the names provided in \code{measure.vars} take precedence. }
3636
\item{na.rm}{If \code{TRUE}, \code{NA} values will be removed from the molten
3737
data.}

src/fmelt.c

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -538,19 +538,18 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str
538538
} else {
539539
for (int j=0, ansloc=0, level=1; j<data->lmax; ++j) {
540540
const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow;
541-
if (thislen==0) continue; // so as not to bump level
542541
char buff[20];
543542
snprintf(buff, 20, "%d", level++);
544543
SEXP str = PROTECT(mkChar(buff));
545544
for (int k=0; k<thislen; ++k) SET_STRING_ELT(target, ansloc++, str);
546545
UNPROTECT(1);
547546
}
548547
}
549-
} else {
548+
} else {// varfactor==TRUE
550549
SET_VECTOR_ELT(ansvars, 0, target=allocVector(INTSXP, data->totlen));
551550
SEXP levels;
552551
int *td = INTEGER(target);
553-
if (data->lvalues == 1) {
552+
if (data->lvalues == 1) {//single output column.
554553
SEXP thisvaluecols = VECTOR_ELT(data->valuecols, 0);
555554
int len = length(thisvaluecols);
556555
levels = PROTECT(allocVector(STRSXP, len)); protecti++;
@@ -573,24 +572,16 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str
573572
const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow;
574573
for (int k=0; k<thislen; ++k) td[ansloc++] = md[j];
575574
}
576-
} else {
575+
} else {//multiple output columns.
577576
int nlevel=0;
578577
levels = PROTECT(allocVector(STRSXP, data->lmax)); protecti++;
579578
for (int j=0, ansloc=0; j<data->lmax; ++j) {
580579
const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow;
581-
if (thislen==0) continue; // so as not to bump level
582580
char buff[20];
583581
snprintf(buff, 20, "%d", nlevel+1);
584582
SET_STRING_ELT(levels, nlevel++, mkChar(buff)); // generate levels = 1:nlevels
585583
for (int k=0; k<thislen; ++k) td[ansloc++] = nlevel;
586584
}
587-
if (nlevel < data->lmax) {
588-
// data->narm is true and there are some all-NA items causing at least one 'if (thislen==0) continue' above
589-
// shrink the levels
590-
SEXP newlevels = PROTECT(allocVector(STRSXP, nlevel)); protecti++;
591-
for (int i=0; i<nlevel; ++i) SET_STRING_ELT(newlevels, i, STRING_ELT(levels, i));
592-
levels = newlevels;
593-
}
594585
}
595586
setAttrib(target, R_LevelsSymbol, levels);
596587
setAttrib(target, R_ClassSymbol, ScalarString(char_factor));

0 commit comments

Comments
 (0)