ARROW-11515: [R] Bindings for strsplit#10190
ARROW-11515: [R] Bindings for strsplit#10190thisisnic wants to merge 38 commits intoapache:masterfrom
Conversation
3118928 to
6e6addc
Compare
b7953fa to
5615486
Compare
|
This PR now also implements |
Co-authored-by: Ian Cook <[email protected]>
|
I'm going to do a bit more testing and polishing and then merge this after the CI is green if I don't hear any objections |
Update: I found a few issues when reviewing which I'm going to try to fix now; I will describe in comments soon |
|
I think this is ready to merge now! The R 3.5 CI failure was happening in other recent PRs so I assume it's unrelated to any changes here. The commits I added yesterday were mostly just minor things. @thisisnic if you have questions or uncertainty about any of the commits I added, please let me know today |
Will do! |
|
|
||
| if (func_name == "split_pattern") { | ||
| using Options = arrow::compute::SplitPatternOptions; | ||
| int64_t max_splits = -1; |
There was a problem hiding this comment.
Rather than setting defaults manually, can you do auto out = std::make_shared<Options>(Options::Defaults()); and then just set them if they're provided? (We do this above for other types.)
There was a problem hiding this comment.
IIRC this didn't work when I tried it
There was a problem hiding this comment.
I can confirm that there is no Defaults() method in arrow::compute::SplitPatternOptions. Apparently some of the options classes just don't have one.
There was a problem hiding this comment.
They probably should be added in the C++ library, can you do that and/or make another JIRA?
There was a problem hiding this comment.
@thisisnic could you make a Jira for this please? Thanks!
There was a problem hiding this comment.
Thinking about this, @nealrichardson and @ianmcook , it may make sense for arrow::compute::SplitPatternOptions not to have defaults, given that the pattern is one of the options - it might not make sense to pre-specify a pattern to split the string on? Neither of the R functions strsplit nor stringr::str_split have defaults for this argument.
Could one perhaps make a case for pattern moving from being an option to instead being a second input, similarly to how arrow::compute::filter and arrow::compute::take operate? At that point, having defaults for SplitPatternOptions feels more natural.
There was a problem hiding this comment.
That all makes good sense to me but I'm not up to speed on what design conventions we follow in the C++ compute functions. Maybe it'd be best to start a Zulip thread to ask folks about this (Joris, Antonie, Maarten Breddels) to get a better idea of the best change to suggest (if any).
|
I opened #10271 to resolve the outstanding comments here. @nealrichardson PTAL |
This resolves a few outstanding comments raised in #10190 Closes #10271 from ianmcook/ARROW-12692 Authored-by: Ian Cook <[email protected]> Signed-off-by: Ian Cook <[email protected]>
This PR adds bindings for both
strsplit()andstringr::str_split()for dplyr