ARROW-13766: [R] Add slice_*() methods#14361
Conversation
r/R/dplyr-slice.R
Outdated
There was a problem hiding this comment.
As @paleolimbot noted on ARROW-13766, it is possible to do with_ties but it requires doing the top-k calculation and then using that as a filtering join. I'm inclined either not to support it (as done here, before I saw that comment) or to flip the default value of with_ties and perhaps raise a warning the first time the function is called to note the difference. Thoughts?
thisisnic
left a comment
There was a problem hiding this comment.
Thanks, will be excellent to have these functions in 10.0.0 :D
A few bits and pieces I noticed when reviewing are below.
r/R/dplyr-slice.R
Outdated
There was a problem hiding this comment.
There's an error message for extraneous slice_*()-related arguments which dplyr has but we don't have that here:
library(dplyr)
library(arrow)
mtcars %>%
slice_tail(n = 3, with_ties = FALSE) %>%
collect()
#> Error in `slice_tail()`:
#> ! `...` must be empty.
#> ✖ Problematic argument:
#> • with_ties = FALSE
mtcars %>%
arrow_table() %>%
slice_tail(n = 3, with_ties = FALSE) %>%
collect()
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
#> 2 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 3 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2There was a problem hiding this comment.
Good catch, added the rlang::check_dots_empty call. I left it as rlang:: instead of adding it to the importFrom we use because I thought you had some PRs open that were touching that, can move there before merging. (Also need to re-run the docgen.R anyway.)
There was a problem hiding this comment.
I had a check, and none of my open PRs are touching that, so good to move whenever.
57a6760 to
f042e57
Compare
|
@thisisnic thanks for your excellent review. I've addressed your feedback and pushed a little more to unlock |
44d50d2 to
d20afba
Compare
Co-authored-by: Nic Crane <[email protected]>
…without projecting it first)
7e4857f to
e62b0f7
Compare
|
@thisisnic want to give this one more look? |
|
Benchmark runs are scheduled for baseline = b5b41cc and contender = 80e3986. 80e3986 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
This PR implements
slice_head,()slice_tail(),slice_min(),slice_max()andslice_sample().slice_sample()requires a clever hack using a UDF because therandom()C++ function apparently does not work; see ARROW-17974.