Skip to content

Commit 6324c45

Browse files
committed
add DT() functional form data.table query
1 parent 4cf9289 commit 6324c45

File tree

4 files changed

+19
-2
lines changed

4 files changed

+19
-2
lines changed

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ export(setnafill)
5757
export(.Last.updated)
5858
export(fcoalesce)
5959
export(substitute2)
60+
export(DT) # mtcars |> DT(i,j,by) #4872
6061

6162
S3method("[", data.table)
6263
S3method("[<-", data.table)

NEWS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,12 @@
109109

110110
21. `melt()` was pseudo generic in that `melt(DT)` would dispatch to the `melt.data.table` method but `melt(not-DT)` would explicitly redirect to `reshape2`. Now `melt()` is standard generic so that methods can be developed in other packages, [#4864](https://github.com/Rdatatable/data.table/pull/4864). Thanks to @odelmarcelle for suggesting and implementing.
111111

112+
22. `DT(i, j, by, ...)` has been added, i.e. functional form of a `data.table` query, [#641](https://github.com/Rdatatable/data.table/issues/641) [#4872](https://github.com/Rdatatable/data.table/issues/4872). Thanks to Yike Lu and Elio Campitelli for filing requests, many others for comments and suggestions, and Matt Dowle for the PR. This enables the `data.table` general form query to be invoked on a `data.frame` without converting it to a `data.table` first. The class of the input object is retained.
113+
114+
```R
115+
mtcars |> DT(mpg>20, .(mean_hp=mean(hp)), by=cyl)
116+
```
117+
112118
## BUG FIXES
113119

114120
1. `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries.

R/data.table.R

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -846,10 +846,10 @@ replace_dot_alias = function(e) {
846846
if (!is.na(nomatch)) irows = irows[irows!=0L] # TO DO: can be removed now we have CisSortedSubset
847847
if (length(allbyvars)) { ############### TO DO TO DO TO DO ###############
848848
if (verbose) catf("i clause present and columns used in by detected, only these subset: %s\n", brackify(allbyvars))
849-
xss = x[irows,allbyvars,with=FALSE,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends]
849+
xss = `[.data.table`(x,irows,allbyvars,with=FALSE,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends)
850850
} else {
851851
if (verbose) catf("i clause present but columns used in by not detected. Having to subset all columns before evaluating 'by': '%s'\n", deparse(by))
852-
xss = x[irows,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends]
852+
xss = `[.data.table`(x,irows,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends)
853853
}
854854
if (bysub %iscall% ':' && length(bysub)==3L) {
855855
byval = eval(bysub, setattr(as.list(seq_along(xss)), 'names', names(xss)), parent.frame())
@@ -1910,6 +1910,8 @@ replace_dot_alias = function(e) {
19101910
setalloccol(ans) # TODO: overallocate in dogroups in the first place and remove this line
19111911
}
19121912

1913+
DT = `[.data.table` #4872
1914+
19131915
.optmean = function(expr) { # called by optimization of j inside [.data.table only. Outside for a small speed advantage.
19141916
if (length(expr)==2L) # no parameters passed to mean, so defaults of trim=0 and na.rm=FALSE
19151917
return(call(".External",quote(Cfastmean),expr[[2L]], FALSE))

man/data.table.Rd

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
\alias{Ops.data.table}
66
\alias{is.na.data.table}
77
\alias{[.data.table}
8+
\alias{DT}
89
\alias{.}
910
\alias{.(}
1011
\alias{.()}
@@ -217,6 +218,8 @@ The way to read this out loud is: "Take \code{DT}, subset rows by \code{i}, \emp
217218
# see ?assign to add/update/delete columns by reference using the same consistent interface
218219
}
219220
221+
A \code{data.table} query may be invoked on a \code{data.frame} using functional form \code{DT(...)}, see examples. The class of the input is retained.
222+
220223
A \code{data.table} is a \code{list} of vectors, just like a \code{data.frame}. However :
221224
\enumerate{
222225
\item it never has or uses rownames. Rownames based indexing can be done by setting a \emph{key} of one or more columns or done \emph{ad-hoc} using the \code{on} argument (now preferred).
@@ -431,6 +434,11 @@ dev.off()
431434
# using rleid, get max(y) and min of all cols in .SDcols for each consecutive run of 'v'
432435
DT[, c(.(y=max(y)), lapply(.SD, min)), by=rleid(v), .SDcols=v:b]
433436

437+
# functional query DT(...)
438+
if (getRversion() >= "4.1.0") { # native pipe |> new in R 4.1.0
439+
mtcars |> DT(mpg>20, .(mean_hp=mean(hp)), by=cyl)
440+
}
441+
434442
# Support guide and links:
435443
# https://github.com/Rdatatable/data.table/wiki/Support
436444

0 commit comments

Comments
 (0)