[SPARK-2554] CountDistinct and SumDistinct should do partial aggregation - ASF Jira

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.0.1, 1.0.2
Fix Version/s: 1.3.0
Component/s: SQL
Labels:
None

Target Version/s:

1.3.0

Description

CountDistinct and SumDistinct should first do a partial aggregation and return unique value sets in each partition as partial results. Shuffle IO can be greatly reduced in in cases that there are only a few unique values.

Attachments

Issue Links

links to

[Github] Pull Request #1935 (marmbrus)

[Github] Pull Request #3348 (ravipesala)

Activity

People

Assignee:: Unassigned

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 17/Jul/14 08:29

Updated:: 19/Dec/14 04:20

Resolved:: 19/Dec/14 04:20