perf(sql): rewrite trivial expressions over same column in GROUP BY queries#4508
perf(sql): rewrite trivial expressions over same column in GROUP BY queries#4508
Conversation
…or Q35. More refactoring tba
| if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL | ||
| && nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) { |
There was a problem hiding this comment.
nit: maybe this could be safely relaxed
| if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL | ||
| && nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) { | ||
|
|
||
| CharSequenceIntHashMap nestedCandidates = new CharSequenceIntHashMap(); |
There was a problem hiding this comment.
nit: is there a lighter option?
There was a problem hiding this comment.
You could move this map into a field and reuse it between the invocations.
| } | ||
|
|
||
| @Test | ||
| public void testRewriteTrivialExpressionsBasic() throws Exception { |
There was a problem hiding this comment.
more tests would be good
…_rewrite_trivial_expr
[PR Coverage check]😍 pass : 49 / 50 (98.00%) file detail
|
|
We need to mitigate perf degradation on the c6a.metal box (192 cores, 384GB RAM): 0.6s for Java vs 2.2s for Rosti. The reason is that Java code implements hash table sharding while C++ code doesn't. Maybe we could mitigate this by limiting the max number of Rosti tables in use here: |
I have a better idea: we should disable keyed Rosti for all types but SYMBOL. That's because SYMBOL type has low cardinality, while it's not always the case for INT or IPv4. So, for high cardinality columns Java-based GROUP BY will be faster than the Rosti one. @bluestreak01 WDYT? |
|
how does it compare effort wise with improving Rosti? After all it is C++ map, we have more options there? What do you think? |
This is not a trivial thing to do, but it's certainly feasible. The thing is that keyed Rosti has a very limited usage, so I'm not sure if it's worth spending the time just to speed up INT and IPv4 keys case. |
|
Plan is to redo this more generically, with bug fix included. Rosti/Async optimisations are a separate issue but should be addressed before next release, so as to preserve performance of this type of query on large boxes. |
|
@nwoolmer why did we close this PR in the end? This rewrite is certainly valuable. |
Closes #4141
This relates to performance around Clickbench Q35.
For Clickbench Q35 on M2 Mac Mini, this speeds up the query from 1.7s to 0.9s.
The rewritten query runs using a Rosti implementation and an early limit, instead of async group by and a late limit.
With changes but async group by instead of rosti, it runs in 1.18s.
Query:
Before change:
Execute: 1.78sAfter change:
Execute: 915.35ms