ARROW-13853: [R] String to_title, to_lower, to_upper kernels#11232
ARROW-13853: [R] String to_title, to_lower, to_upper kernels#11232edponce wants to merge 14 commits intoapache:masterfrom
Conversation
180d12f to
f6c29d0
Compare
|
@github-actions crossbow submit test-r-linux-as-cran |
|
Revision: f6c29d0379efb31fe4e41c959356bc484c4ca43c Submitted crossbow builds: ursacomputing/crossbow @ actions-859
|
jonkeane
left a comment
There was a problem hiding this comment.
Overall, this looks good — a few comments and running our as-cran crossbow build to confirm that this doesn't error on that.
a53004b to
42f6860
Compare
jonkeane
left a comment
There was a problem hiding this comment.
Something is up with the tests, I've added a suggestion that should get around that. If it persists or you want me to dig into this deeper let me know and I'll pull the branch locally and figure out what's going wrong here.
|
Thanks all for the great reviews! |
|
Interesting...the failed tests are because # R stringr
> str_to_title("1Foo1") # "1foo1"
# Python
>>> "1Foo1".title() # "1Foo1"I conclude that this also applies for "is_title()" function because it ignores beginning integers...well... What to do in such a case? cc @jonkeane @nealrichardson |
|
Hmm, practically speaking I suspect this won't matter to all that many people. We have a few options:
I suspect that 2 is probably the easiest right now. |
|
And if you're looking for other ways this could be even more hairy: https://stat.ethz.ch/R-manual/R-devel/library/tools/html/toTitleCase.html
|
|
Yeah, when I was implementing the For this PR, the R binding calls the C++ kernels directly, so it will produce different results for extreme cases when compared to stringr. If users find issues or request more complex functionality, then this can be revisited in a future PR. |
nealrichardson
left a comment
There was a problem hiding this comment.
CI failure is unrelated (failure to install an optional dependency)
This PR adds locale validation to string functions matching stringr: str_to_title, str_to_lower, and str_to_upper.
In Arrow C++, these kernels do not support a locale argument so only the default R locale ("en") is currently supported.
Closes apache#11232 from edponce/ARROW-13853-String-title-case-kernel
Authored-by: Eduardo Ponce <[email protected]>
Signed-off-by: Neal Richardson <[email protected]>
This PR adds locale validation to string functions matching stringr: str_to_title, str_to_lower, and str_to_upper.
In Arrow C++, these kernels do not support a locale argument so only the default R locale ("en") is currently supported.