perf(sql): rewrite trivial expressions over same column in GROUP BY queries by nwoolmer · Pull Request #4508 · questdb/questdb

nwoolmer · 2024-05-15T13:53:18Z

This relates to performance around Clickbench Q35.

For Clickbench Q35 on M2 Mac Mini, this speeds up the query from 1.7s to 0.9s.

The rewritten query runs using a Rosti implementation and an early limit, instead of async group by and a late limit.

With changes but async group by instead of rosti, it runs in 1.18s.

Query:

SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c 
FROM hits 
GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 
ORDER BY c DESC LIMIT 10;

Before change:

Sort light lo: 10
  keys: [c desc]
    VirtualRecord
      functions: [ClientIP,column,column1,column2,c]
        Async Group By workers: 8
          keys: [ClientIP,column,column1,column2]
          values: [count(*)]
          filter: null
            DataFrame
                Row forward scan
                Frame forward scan on: hits

Execute: 1.78s

After change:

VirtualRecord
  functions: [ClientIP,ClientIP-1,ClientIP-2,ClientIP-3,c]
    Sort light lo: 10
      keys: [c desc]
        GroupBy vectorized: true workers: 8
          keys: [ClientIP]          values: [count(*)]
            DataFrame
                Row forward scan
                Frame forward scan on: hits

Execute: 915.35ms

…or Q35. More refactoring tba

core/src/main/java/io/questdb/griffin/SqlOptimiser.java

nwoolmer · 2024-05-15T14:55:02Z

core/src/main/java/io/questdb/griffin/SqlOptimiser.java

+        if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL
+                && nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) {


nit: maybe this could be safely relaxed

core/src/main/java/io/questdb/griffin/model/QueryModel.java

nwoolmer · 2024-05-15T15:43:16Z

core/src/main/java/io/questdb/griffin/SqlOptimiser.java

+        if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL
+                && nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) {
+
+            CharSequenceIntHashMap nestedCandidates = new CharSequenceIntHashMap();


nit: is there a lighter option?

You could move this map into a field and reuse it between the invocations.

nwoolmer · 2024-05-15T15:44:17Z

core/src/test/java/io/questdb/test/griffin/SqlOptimiserTest.java

    }

+    @Test
+    public void testRewriteTrivialExpressionsBasic() throws Exception {


more tests would be good

…_rewrite_trivial_expr

…this.

ideoma · 2024-05-28T10:08:55Z

[PR Coverage check]

😍 pass : 49 / 50 (98.00%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/SqlOptimiser.java	48	49	97.96%
🔵	io/questdb/griffin/model/QueryModel.java	1	1	100.00%

puzpuzpuz · 2024-05-28T10:28:27Z

We need to mitigate perf degradation on the c6a.metal box (192 cores, 384GB RAM): 0.6s for Java vs 2.2s for Rosti. The reason is that Java code implements hash table sharding while C++ code doesn't. Maybe we could mitigate this by limiting the max number of Rosti tables in use here:

questdb/core/src/main/java/io/questdb/griffin/engine/groupby/vect/GroupByRecordCursorFactory.java

Line 82 in fe2aeca

this.workerCount = workerCount;

puzpuzpuz · 2024-05-29T06:32:48Z

We need to mitigate perf degradation on the c6a.metal box (192 cores, 384GB RAM): 0.6s for Java vs 2.2s for Rosti. The reason is that Java code implements hash table sharding while C++ code doesn't. Maybe we could mitigate this by limiting the max number of Rosti tables in use here

I have a better idea: we should disable keyed Rosti for all types but SYMBOL. That's because SYMBOL type has low cardinality, while it's not always the case for INT or IPv4. So, for high cardinality columns Java-based GROUP BY will be faster than the Rosti one.

@bluestreak01 WDYT?

bluestreak01 · 2024-05-29T11:00:21Z

how does it compare effort wise with improving Rosti? After all it is C++ map, we have more options there? What do you think?

puzpuzpuz · 2024-05-29T11:20:09Z

how does it compare effort wise with improving Rosti? After all it is C++ map, we have more options there? What do you think?

This is not a trivial thing to do, but it's certainly feasible. The thing is that keyed Rosti has a very limited usage, so I'm not sure if it's worth spending the time just to speed up INT and IPv4 keys case.

nwoolmer · 2024-05-30T10:16:33Z

Plan is to redo this more generically, with bug fix included.

Rosti/Async optimisations are a separate issue but should be addressed before next release, so as to preserve performance of this type of query on large boxes.

puzpuzpuz · 2024-10-10T06:47:03Z

@nwoolmer why did we close this PR in the end? This rewrite is certainly valuable.

nwoolmer and others added 7 commits May 14, 2024 17:41

First pass at rewriting, not quite there

bec0042

All gets broken after propagateTopDownColumns

a906708

Avoid duplication of functions

f473b45

Virtualise

53d9158

Working version but non ideal

7171d72

Revised version which modifies in place, refactor and tests tba

08c4284

Add basic test. Performance improvement is 1.7s -> 900ms on M2 chip f…

77f2d12

…or Q35. More refactoring tba

nwoolmer added SQL Issues or changes relating to SQL execution Performance Performance improvements labels May 15, 2024

Merge branch 'master' into nw_rewrite_trivial_expr

d513f3f

nwoolmer commented May 15, 2024

View reviewed changes

core/src/main/java/io/questdb/griffin/SqlOptimiser.java Outdated Show resolved Hide resolved

Refactoring

f849066

nwoolmer commented May 15, 2024

View reviewed changes

core/src/main/java/io/questdb/griffin/model/QueryModel.java Show resolved Hide resolved

nwoolmer requested a review from puzpuzpuz May 15, 2024 15:40

nwoolmer and others added 2 commits May 15, 2024 16:41

Exception never thrown lint

d1602f4

Merge branch 'master' into nw_rewrite_trivial_expr

0fb5bec

nwoolmer marked this pull request as ready for review May 15, 2024 15:41

nwoolmer added the ready for review label May 15, 2024

nwoolmer commented May 15, 2024

View reviewed changes

Rename

949d270

nwoolmer commented May 15, 2024

View reviewed changes

nwoolmer and others added 6 commits May 15, 2024 16:45

Merge remote-tracking branch 'origin/nw_rewrite_trivial_expr' into nw…

10331c6

…_rewrite_trivial_expr

Merge branch 'master' into nw_rewrite_trivial_expr

29d8943

Fix renamed utility

ccbd4bb

Fix test

6eb577f

Merge branch 'master' into nw_rewrite_trivial_expr

8121e0e

Remove alias index rebuild since removeColumn has been changed to do …

4950ad8

…this.

nwoolmer added the DO NOT MERGE These changes should not be merged to main branch label May 28, 2024

Merge branch 'master' into nw_rewrite_trivial_expr

a83af93

nwoolmer closed this May 30, 2024

bluestreak01 deleted the nw_rewrite_trivial_expr branch July 19, 2024 18:07

puzpuzpuz mentioned this pull request Aug 9, 2025

perf(sql): rewrite trivial expressions over same column in GROUP BY queries #6043

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sql): rewrite trivial expressions over same column in GROUP BY queries#4508

perf(sql): rewrite trivial expressions over same column in GROUP BY queries#4508
nwoolmer wants to merge 19 commits intomasterfrom
nw_rewrite_trivial_expr

nwoolmer commented May 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

nwoolmer May 15, 2024

Uh oh!

Uh oh!

nwoolmer May 15, 2024

Uh oh!

puzpuzpuz May 28, 2024

Uh oh!

nwoolmer May 15, 2024

Uh oh!

ideoma commented May 28, 2024

Uh oh!

puzpuzpuz commented May 28, 2024

Uh oh!

puzpuzpuz commented May 29, 2024

Uh oh!

bluestreak01 commented May 29, 2024

Uh oh!

puzpuzpuz commented May 29, 2024

Uh oh!

nwoolmer commented May 30, 2024

Uh oh!

puzpuzpuz commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL
		&& nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) {

Conversation

nwoolmer commented May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nwoolmer May 15, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nwoolmer May 15, 2024

Choose a reason for hiding this comment

Uh oh!

puzpuzpuz May 28, 2024

Choose a reason for hiding this comment

Uh oh!

nwoolmer May 15, 2024

Choose a reason for hiding this comment

Uh oh!

ideoma commented May 28, 2024

[PR Coverage check]

file detail

Uh oh!

puzpuzpuz commented May 28, 2024

Uh oh!

puzpuzpuz commented May 29, 2024

Uh oh!

bluestreak01 commented May 29, 2024

Uh oh!

puzpuzpuz commented May 29, 2024

Uh oh!

nwoolmer commented May 30, 2024

Uh oh!

puzpuzpuz commented Oct 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nwoolmer commented May 15, 2024 •

edited

Loading