-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Description
In version 1.11.2, when using %in% and & statements together, %in% does not respect factor starting with a capitalized letter. Here is an example:
install.packages('data.table')
packageVersion("data.table") # ‘1.11.2’
data("iris")
library(data.table)
iris <- data.table(iris)
iris$grp <- c('A', 'B')
[Issue]
After capitalizing the first letter in 'virginica', %in% statement cannot return to both groups when using a & statement, see below:
iris[, Species1 := factor(Species, levels = c('setosa', 'versicolor', 'virginica'), labels = c('setosa', 'versicolor', 'Virginica'))]
iris[Species1 %in% c('setosa', 'Virginica') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor Virginica
# 0 0 25
[Examples]
Tried with few examples below and they work fine.
If I subset on groups containing lowercases only, both groups were found.
iris[Species1 %in% c('setosa', 'versicolor') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor Virginica
# 25 25 0
Or, if I add parenthesis to the either statement, both groups were found.
iris[(Species1 %in% c('setosa', 'Virginica')) & grp == 'B', table(Species1)]
# Species1
# setosa versicolor Virginica
# 25 0 25
iris[Species1 %in% c('setosa', 'Virginica') & (grp == 'B'), table(Species1)]
# Species1
# setosa versicolor Virginica
# 25 0 25
I tried this statement in subset function and it works.
table(subset(iris, Species1 %in% c('setosa', 'Virginica') & grp == 'B')$Species1)
# setosa versicolor Virginica
# 25 0 25
This feature works in an older version data.table package (use version 1.10.4-3 as an example here):
devtools::install_version("data.table", version = "1.10.4-3", repos = "http://cran.us.r-project.org")
packageVersion("data.table") # ‘1.10.4.3’
data("iris")
library(data.table)
iris <- data.table(iris)
iris$grp <- c('A', 'B')
iris[, Species1 := factor(Species, levels = c('setosa', 'versicolor', 'virginica'), labels = c('setosa', 'versicolor', 'Virginica'))]
iris[Species1 %in% c('setosa', 'Virginica') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor Virginica
# 25 0 25
[session info]
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.2
loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4 yaml_2.1.18