fix: inconsistent scalar types in DistinctArrayAggAccumulator state#7385
fix: inconsistent scalar types in DistinctArrayAggAccumulator state#7385alamb merged 3 commits intoapache:mainfrom
DistinctArrayAggAccumulator state#7385Conversation
|
|
||
| # additional count(1) forces array_agg_distinct instead of array_agg over aggregated by c2 data | ||
| statement ok | ||
| SELECT array_agg(distinct c2), count(1) FROM aggregate_test_100 |
There was a problem hiding this comment.
Can you verify the output as well?
There was a problem hiding this comment.
Unfortunately, we don't have array_sort / list_sort function yet, and output of array_agg(distinct) is non-deterministic, so absence of failures is the best option of SQL-level check (as I can see it)
There was a problem hiding this comment.
What about unnesting and then using order by?
There was a problem hiding this comment.
Done using cross-join and CTE for array indices
Co-authored-by: Metehan Yıldırım <[email protected]>
| (0..array.len()).try_for_each(|i| { | ||
| let scalar = ScalarValue::try_from_array(array, i)?; | ||
| if let ScalarValue::List(Some(values), _) = scalar { | ||
| self.values.extend(values); |
There was a problem hiding this comment.
👍 extend should do the deduplication. 👍
| # ---- | ||
| # [4, 2, 3, 5, 1] | ||
| query III | ||
| WITH indices AS ( |
There was a problem hiding this comment.
Maybe we can make an input that has at least one non distinct value too -- not just 1,2,3,4,5 but maybe repeat the value 1 a few times:
SELECT 1 AS idx UNION ALL
SELECT 1 AS idx UNION ALL
SELECT 1 AS idx UNION ALL
SELECT 2 AS idx UNION ALL
...
|
Thanks @korowa and @metesynnada |
Which issue does this PR close?
Closes #6743.
Rationale for this change
Currently
DistinctArrayAggAccumulatorstores its state as a set of scalar values and returns it as a single scalar list. The problem is that while merging state, accumulator, instead of adding list elements, simply adds the whole list to the state as a single value, which causes errors while executingevaluateon accumulator with merged state.What changes are included in this PR?
merge_statenow unpacks scalar list input and updates state with its elements.Are these changes tested?
DistinctArrayAggAccumulators have been added,array_agg(distinct col)has been added to sqllogictestsAre there any user-facing changes?
array_agg(distinct)should not fail in case query execution requires merging accumulator states (e. g. aggregation over multiple partitions)