Skip to content

Conversation

@Loaki07
Copy link
Contributor

@Loaki07 Loaki07 commented Nov 4, 2025

Description

  • the cache interval is 0, because of huge cardinality for the stream field and streaming aggs is hence false
  • streaming_aggs is false due to huge cardinality of a stream field and cache interval is set to 0
  • the partitions only for this case are generate based on the normal logic with mini partition which cannot be the case for aggregate queries.
  • Creates duplicates because each partition could have the same data

Fix:

  • check if query is_aggregate and is_streaming_aggs is false then return a single partition

Impact:

Query:

SELECT
    approx_percentile_cont(CAST(gold_medals AS Float), 0.50) AS P50_Gold,
    approx_percentile_cont(CAST(gold_medals AS Float), 0.75) AS P75_Gold,
    approx_percentile_cont(CAST(gold_medals AS Float), 0.95) AS P95_Gold,
    approx_percentile_cont(CAST(total_medals AS Float), 0.50) AS P50_Total,
    approx_percentile_cont(CAST(total_medals AS Float), 0.75) AS P75_Total,
    approx_percentile_cont(CAST(total_medals AS Float), 0.95) AS P95_Total,
    CASE
        WHEN continent = 'EUR' THEN 'Europe'
        WHEN continent = 'ASI' THEN 'Asia'
        WHEN continent = 'AME' THEN 'Americas'
        WHEN continent = 'AFR' THEN 'Africa'
        WHEN continent = 'OCE' THEN 'Oceania'
        ELSE 'Other'
    END AS bucket,
    COUNT(_timestamp) as "count"
FROM "oly"
WHERE
    continent IN ('EUR', 'ASI', 'AME', 'AFR', 'OCE')
AND total_medals > 0
GROUP BY bucket
ORDER BY bucket

Results:

image (7)

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Nov 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

Failed to generate code suggestions for PR

@hengfeiyang hengfeiyang changed the title fix: aggregate query with highe cardinality should return single fix: aggregate query with higher cardinality should return single Nov 4, 2025
@Loaki07 Loaki07 merged commit 953772b into main Nov 4, 2025
58 of 62 checks passed
@Loaki07 Loaki07 deleted the fix/aggregate_query_huge_cardinality_single_partition_main branch November 4, 2025 23:18
Loaki07 added a commit that referenced this pull request Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants