Add regexp_extract func#14282
Conversation
|
Thanks for your contribution! I've left some comments for your review. |
|
FYI @Omega359 noted this is quite similar to |
|
FYI Spark regex is not the same as Rust regex and can have different results |
There was a problem hiding this comment.
This is looking good.
- I may have missed it but I don't see tests for out of bounds indexes (negative #'s, index larger than match group count).
- The docs do not say whether the index is 0-indexed or 1-indexed.
- I would love to see a benchmark for this but that absolutely could be added as a followup PR
- The returning of Utf8View should be corrected - either just return Utf8/LargeUtf8 or fix the return to actually return utf8view if input is utf8view
My primary concern is there are two PR's for essentially the same functionality. I am not in a position to decide which should be included.
| &self.signature | ||
| } | ||
|
|
||
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { |
There was a problem hiding this comment.
Can you implement return_type_from_args with marking nullable true only if the any of the input types are nullable (similiar to this #14841)
|
@SKY-ALIN - are you able to look into the above review comments? |
|
I'll take a look at this later today @SKY-ALIN |
|
I spent some time today looking at this PR. Here are my thoughts:
As a note the regexp_substr PR (#14323) is about to expire and the author looks to have submitted it and abandoned it. I think there is a number of ideas that you've implemented in this PR that could be incorporated into a new PR perhaps based on the above mentioned PR? Just a thought if you are interested - I myself really do appreciate the work you've put into this PR ❤️ |
Which issue does this PR close?
Closes #14280.
Rationale for this change
Adding more functions
What changes are included in this PR?
The implementation of regexp_extract based on Spark regexp_extract
Are these changes tested?
Yes
Are there any user-facing changes?
No