Change ScalarValue::List to store ArrayRef#7629
Conversation
dc280bf to
73e78c0
Compare
a87d75b to
8ce6e93
Compare
|
I plan to give this a look later today or tomorrow. Thanks @jayzhan211 |
alamb
left a comment
There was a problem hiding this comment.
Thank you @jayzhan211 -- this is an epic PR. I am sorry for the delay in reviewing. It is hard for me to find enough contiguous time to review large PRs like this.
I left some comments. Other than the todo!() in PartialOrd (which would result in a panic) I think this PR is basically ready to go.
datafusion/common/src/scalar.rs
Outdated
|
|
||
| fn iter_to_array_list( | ||
| scalars: impl IntoIterator<Item = ScalarValue>, | ||
| // This function does not contains nulls but empty array instead. |
There was a problem hiding this comment.
what does "function not contain nulls" mean?
There was a problem hiding this comment.
NullBuffer for ListArray is None in this function, you can't get NULL in the array from this function, only an empty array is possible.
I tried to have one function for iter_to_array_list_without_nulls and iter_to_array_list but failed since one needs nulls, and another needs an empty array.
| in_data.data_type() | ||
| ) | ||
| array_agg: Vec<Vec<ScalarValue>>, | ||
| ) -> Result<Vec<Vec<Vec<ScalarValue>>>> { |
There was a problem hiding this comment.
Would it be possible to document what the three levels of Vec mean semantically ?
There was a problem hiding this comment.
Actually, I don't understand either 😢 Let me take a chance to learn more about aggregate.
There was a problem hiding this comment.
I left the comment from merge_ordered_arrays for now. I think there might be some space to improve OrderSensitiveArrayAggAccumulator, but it is better to do it in another PR.
| Some(vec![]), | ||
| vec![Field::new("item", DataType::Int16, true)].into(), | ||
| ), | ||
| // Should fail due to inconsistent types in the list |
There was a problem hiding this comment.
why were these tests removed? Do they pass now? If so, perhaps we can move them to the positive cases?
There was a problem hiding this comment.
Because it is not possible to build inconsistent types with ListArray, therefore I just removed the whole test.
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
96f6f7d to
577612b
Compare
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
|
@jayzhan211 please mark this PR as ready for review when you are ready for me to give it another look. |
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
| // in_data is Vec<ScalarValue> where ScalarValue does not include ScalarValue::List | ||
| for in_data in array_agg.into_iter() { | ||
| let ordering = in_data.into_iter().map(|struct_vals| { | ||
| if let ScalarValue::Struct(Some(orderings), _) = struct_vals { |
There was a problem hiding this comment.
The reason we have 3 levels of Vec in orderings is because we got Vec\<ScalarValue> in Struct, but orderings seems to have only one ScalarValue, orderings.len() == 1 based on the test.
Is it reasonable to have >1 ScalarValue for orderings?
There was a problem hiding this comment.
Maybe @mustafasrepo or @metegenez can offer some guidance here
alamb
left a comment
There was a problem hiding this comment.
Thank you @jayzhan211 -- this is an epic PR and reflects a lot of work. Thank you for the contribution
| // in_data is Vec<ScalarValue> where ScalarValue does not include ScalarValue::List | ||
| for in_data in array_agg.into_iter() { | ||
| let ordering = in_data.into_iter().map(|struct_vals| { | ||
| if let ScalarValue::Struct(Some(orderings), _) = struct_vals { |
There was a problem hiding this comment.
Maybe @mustafasrepo or @metegenez can offer some guidance here
Thanks for your review! |
|
I merged this PR up from main and plan to merge it when it passes CI. I will also make a follow on PR to clean up the to/from proto serialization. |
|
Thanks again @jayzhan211 -- epic work. |
|
Awesome, thanks @jayzhan211 |
|
I believe we found another bug here: #7938 |
Which issue does this PR close?
Closes #7352
Closes #996
Rationale for this change
What changes are included in this PR?
Old ScalarValue::List is replaced with ListArray.
convert_array_to_scalar_vecandscalars_to_list_arrayare two core utilities function that transform between Vec and ListArray.Are these changes tested?
Are there any user-facing changes?