Skip to content

[bug] %in% statement fails if the category contains both lowercase and uppercase letters #2881

@ddong63

Description

@ddong63

In version 1.11.2, when using %in% and & statements together, %in% does not respect factor starting with a capitalized letter. Here is an example:

install.packages('data.table')
packageVersion("data.table")   # ‘1.11.2’
data("iris")
library(data.table)
iris <- data.table(iris)
iris$grp <- c('A', 'B')

[Issue]
After capitalizing the first letter in 'virginica', %in% statement cannot return to both groups when using a & statement, see below:

iris[, Species1 := factor(Species, levels = c('setosa', 'versicolor', 'virginica'), labels = c('setosa', 'versicolor', 'Virginica'))]

iris[Species1 %in% c('setosa', 'Virginica') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 0          0         25 

[Examples]
Tried with few examples below and they work fine.
If I subset on groups containing lowercases only, both groups were found.

iris[Species1 %in% c('setosa', 'versicolor') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 25         25          0 

Or, if I add parenthesis to the either statement, both groups were found.

iris[(Species1 %in% c('setosa', 'Virginica')) & grp == 'B', table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 25          0         25 
iris[Species1 %in% c('setosa', 'Virginica') & (grp == 'B'), table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 25          0         25 

I tried this statement in subset function and it works.

table(subset(iris, Species1 %in% c('setosa', 'Virginica') & grp == 'B')$Species1)
# setosa versicolor  Virginica 
# 25          0         25 

This feature works in an older version data.table package (use version 1.10.4-3 as an example here):

devtools::install_version("data.table", version = "1.10.4-3", repos = "http://cran.us.r-project.org")

packageVersion("data.table")   # ‘1.10.4.3’
data("iris")
library(data.table)
iris <- data.table(iris)
iris$grp <- c('A', 'B')

iris[, Species1 := factor(Species, levels = c('setosa', 'versicolor', 'virginica'), labels = c('setosa', 'versicolor', 'Virginica'))]

iris[Species1 %in% c('setosa', 'Virginica') & grp == 'B', table(Species1)]
# Species1
# setosa versicolor  Virginica 
# 25          0         25 

[session info]

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4    yaml_2.1.18   

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions