Skip to content

Unexpected result for max of character variable by group #5331

@markseeto

Description

@markseeto

I was surprised by this:

DT <- data.table(group = c("g1", "g1", "g2", "g2"),
                 x = c("alice", "Bob", "carol", "david"))

DT
#    group     x
# 1:    g1 alice
# 2:    g1   Bob
# 3:    g2 carol
# 4:    g2 david

DT2 <- DT[, .(m1 = max(x)), by = "group"]

DT2
#    group    m1
# 1:    g1 alice
# 2:    g2 david

DT3 <- DT[, .(m1 = max(x), m2 = max(tolower(x))), by = "group"]

DT3
#    group    m1    m2
# 1:    g1   Bob   bob
# 2:    g2 david david

DT[group == "g1", max(x)]
# [1] "Bob"

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.2

loaded via a namespace (and not attached):
[1] compiler_4.1.2

Why are DT2$m1 and DT3$m1 different?

And why is DT2[group == "g1", m1] not the same as DT[group == "g1", max(x)]?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions