Fix array_has_all and array_has_any with empty array#15039
Fix array_has_all and array_has_any with empty array#15039alamb merged 8 commits intoapache:mainfrom
Conversation
|
@jayzhan211 Could you help approve the test workflows and review the PR? Thanks in advance |
| ComparisonType::All => true, | ||
| ComparisonType::Any => false, |
There was a problem hiding this comment.
It's weird to me that All is true and Any is false but I agree it matches the descriptions:
array_has_all: Returns true if all elements of sub-array exist in array.
array_has_any: Returns true if any elements exist in both arrays.
I guess, from a set perspective, we have "true if the intersection of the two sets has the same length as needle" (all) and "true if the intersection of the two sets is non-empty" (any).
There was a problem hiding this comment.
D select array_has_any([1,2,3], []);
┌────────────────────────────────────────────────────────────┐
│ array_has_any(main.list_value(1, 2, 3), main.list_value()) │
│ boolean │
├────────────────────────────────────────────────────────────┤
│ false │
└────────────────────────────────────────────────────────────┘
D select array_has_all([1,2,3], []);
┌────────────────────────────────────────────────────────────┐
│ array_has_all(main.list_value(1, 2, 3), main.list_value()) │
│ boolean │
├────────────────────────────────────────────────────────────┤
│ true │
└────────────────────────────────────────────────────────────┘
LGTM
There was a problem hiding this comment.
Sorry, yes. I just meant it was strange intuition (I was rambling). I was not arguing correctness. I agree this is good.
| ComparisonType::All => true, | ||
| ComparisonType::Any => false, | ||
| }; | ||
| let result = BooleanArray::from(vec![result_value; haystack.len()]); |
There was a problem hiding this comment.
Using BooleanBuffer::new_set and BooleanBuffer::new_unset should be slightly more efficient.
There was a problem hiding this comment.
Using
if needle.values().len() == 0 {
return match comparison_type {
ComparisonType::All => Ok(Arc::new(BooleanBuffer::new_set(haystack.len()))),
ComparisonType::Any => Ok(Arc::new(BooleanBuffer::new_unset(haystack.len())))
};
}
Runs into
the trait bound `BooleanBuffer: arrow::array::Array` is not satisfied
the following other types implement trait `arrow::array::Array`:
&T
BooleanArray
DictionaryArray<T>
FixedSizeBinaryArray
FixedSizeListArray
GenericByteArray<T>
GenericByteViewArray<T>
GenericListArray<OffsetSize>
and 10 others
required for the cast from `std::sync::Arc<BooleanBuffer>` to `std::sync::Arc<dyn arrow::array::Array>`rustc[Click for full compiler diagnostic](rust-analyzer-diagnostics-view:/diagnostic%20message%20[0]?0#file:///Users/lu/Projects/Github/Work/datafusion/datafusion/functions-nested/src/array_has.rs)
So confusing about the different types loll
There was a problem hiding this comment.
You should use BooleanArray::from(BooleanBuffer::new_/* ... */)
There was a problem hiding this comment.
Thanks for the suggestion! Changed to use BooleanBuffer
Co-authored-by: Alex Huang <[email protected]>
| false false false false | ||
| false false false false | ||
|
|
||
| query BB |
There was a problem hiding this comment.
this query passes on main for me. In other words I don't think the test covers the bug fix 🤔
DataFusion CLI v46.0.0
> select array_has_all(make_array(1,2,3), []),
array_has_any(make_array(1,2,3), []);
+--------------------------------------------------------------------+--------------------------------------------------------------------+
| array_has_all(make_array(Int64(1),Int64(2),Int64(3)),make_array()) | array_has_any(make_array(Int64(1),Int64(2),Int64(3)),make_array()) |
+--------------------------------------------------------------------+--------------------------------------------------------------------+
| true | false |
+--------------------------------------------------------------------+--------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.024 seconds.
There was a problem hiding this comment.
Thanks for catching that! Sorry not quite familar with the testing system yet
main cargo run --bin datafusion-cli
DataFusion CLI v46.0.0
> select array_has_all(make_array('1','2','3'),[]), array_has_any(make_array('1','2','3'),[]);
Arrow error: Invalid argument error: RowConverter column schema mismatch, expected Utf8 got Int64
will add this test case
|
This PR appears to have some build / CI failures that are preventing it from merging so marking it as drat @LuQQiu can you please resolve the issues so it can be merged? |
|
THanks again ! |
|
Thanks for all the reviews and suggestions! |
Which issue does this PR close?
What changes are included in this PR?
Fix array_has_any and array_has_all with empty array input
Are these changes tested?
Added the unit test in array.slt
Are there any user-facing changes?
NA