Description
CountDistinct and SumDistinct should first do a partial aggregation and return unique value sets in each partition as partial results. Shuffle IO can be greatly reduced in in cases that there are only a few unique values.
CountDistinct and SumDistinct should first do a partial aggregation and return unique value sets in each partition as partial results. Shuffle IO can be greatly reduced in in cases that there are only a few unique values.