ARROW-9856: [R] Add bindings for string compute functions#9423
ARROW-9856: [R] Add bindings for string compute functions#9423nealrichardson wants to merge 4 commits intoapache:masterfrom
Conversation
|
@github-actions crossbow submit homebrew-r-autobrew |
|
Revision: 66d81b82423cca02bf7c96a417de0b6613db9e99 Submitted crossbow builds: ursacomputing/crossbow @ actions-79
|
|
Aside from possibly adding more tests here (arguably redundant since the functions themselves are tested in C++, but perhaps worth more assurance that they behave as an R user would expect), this is done. Other TODOs have been made into followup JIRAs. |
r/tests/testthat/test-dplyr.R
Outdated
There was a problem hiding this comment.
This test would pass without the changes in this PR. Do you want to try to test toupper() and tolower() in a way that would have failed before but succeeds now?
There was a problem hiding this comment.
It's tricky with the R string functions like this that start with
if (!is.character(x))
x <- as.character(x)
because that will work by coercing the Array data silently.
Maybe we can test that by defining as.character <- function(...) stop("🙅") in the test?
There was a problem hiding this comment.
I looked into this. You can't just define as.character in the test because you need to replace the one that toupper calls. You can't use testthat::with_mock to mock base functions anymore, and my other hack of using trace to insert a crash seems to crash too many things (presumably legitimate places where as.character gets called elsewhere in the dplyr code). mockery might allow this more targeted but I haven't had great success setting it up before; there are other ways we could instrument our code to make sure the Arrow function is called, but I'm not sure it's worth it. It definitely won't work if you called toupper on a Dataset expression, so we get some related test coverage there.
There was a problem hiding this comment.
I did work out a test for this, PTAL
b55bd62 to
de9ae12
Compare
d4608a9 to
356c300
Compare
de9ae12 to
9603cb2
Compare
|
I'm going to merge this now. We're going to be iterating on this code for a while so I don't feel compelled to make everything perfect and thorough here. |
| collect(), | ||
| tbl | ||
| ), | ||
| NA) |
Includes dependency toolchain updates (utf8proc, re2) and bindings for
stringrbut three functions in arrow (utf8_ltrim_whitespace, utf8_rtrim_whitespace, utf8_trim_whitespace)