Add substring support for binary#1608
Conversation
fix some test for string array Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
|
|
||
| cases.into_iter().try_for_each::<_, Result<()>>( | ||
| |(array, start, length, expected)| { | ||
| let array = StringArray::from(array); |
There was a problem hiding this comment.
We did not cover LargeStringArray in the past.
| use std::cmp::Ordering; | ||
|
|
||
| fn generic_substring<OffsetSize: StringOffsetSizeTrait>( | ||
| fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>( |
There was a problem hiding this comment.
The implementation is similar to utf8_substring except that there is no char boundary checking.
There was a problem hiding this comment.
Rather than replicate quite so much code, I wonder if it would be possible to make generic_substring take a function to check char boundaries and and then pass in a function that does nothing
Maybe like(untested):
fn generic_substring<OffsetSize, F>(
array: &GenericBinaryArray<OffsetSize>,
start: OffsetSize,
length: Option<OffsetSize>,
check_char_boundary: F,
)
where
OffsetSize: StringOffsetSizeTrait,
F: Fn(OffsetSize) -> Result<OffsetSize>
{
...
}There was a problem hiding this comment.
I am afraid that BinaryArray and StringArray can not share one API because GenericBinaryArray and GenericStringArray are two different types.
Maybe we could use macro to extract some common codes.
There was a problem hiding this comment.
Got it -- makes sense -- duplication makes sense to me then
Codecov Report
@@ Coverage Diff @@
## master #1608 +/- ##
==========================================
+ Coverage 82.95% 83.00% +0.04%
==========================================
Files 193 193
Lines 55435 55571 +136
==========================================
+ Hits 45988 46125 +137
+ Misses 9447 9446 -1
Continue to review full report at Codecov.
|
Signed-off-by: remzi <[email protected]>
alamb
left a comment
There was a problem hiding this comment.
Thank you @HaoYang670 -- I went through the tests carefully and it looks good to me.
cc @viirya and @jhorstmann
| use std::cmp::Ordering; | ||
|
|
||
| fn generic_substring<OffsetSize: StringOffsetSizeTrait>( | ||
| fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>( |
There was a problem hiding this comment.
Rather than replicate quite so much code, I wonder if it would be possible to make generic_substring take a function to check char boundaries and and then pass in a function that does nothing
Maybe like(untested):
fn generic_substring<OffsetSize, F>(
array: &GenericBinaryArray<OffsetSize>,
start: OffsetSize,
length: Option<OffsetSize>,
check_char_boundary: F,
)
where
OffsetSize: StringOffsetSizeTrait,
F: Fn(OffsetSize) -> Result<OffsetSize>
{
...
}| fn with_nulls<T: 'static + Array + PartialEq + From<Vec<Option<&'static str>>>>( | ||
| ) -> Result<()> { | ||
| #[allow(clippy::type_complexity)] | ||
| fn with_nulls_generic_binary<O: BinaryOffsetSizeTrait>() -> Result<()> { |
There was a problem hiding this comment.
I know this follows string version with_nulls, just wondering why in with_nulls_... only edge cases are tested against. without_nulls_... has normal test cases but it doesn't include nulls.
viirya
left a comment
There was a problem hiding this comment.
Looks good. Thanks @HaoYang670
Which issue does this PR close?
Closes #1593 .
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
Update some docs