perf(sql): vectorize avg(short) and improve sum() functions memory usage#6166
perf(sql): vectorize avg(short) and improve sum() functions memory usage#6166bluestreak01 merged 14 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
73c774a to
23b738e
Compare
|
GitHub Actions - Rebuild Native Libraries seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
…to puzpuzpuz_faster_avg_short
…to puzpuzpuz_faster_avg_short
[PR Coverage check]😍 pass : 138 / 160 (86.25%) file detail
|
|
@puzpuzpuz In the PR title, "vectorize" means using some SIMD, but this change does not change this aspect much; it adds the average, sum short implementations in Java. Perhaps the title should just say it speeds up short aggregate functions? Otherwise, good find, I'm happy to approve |
@ideoma the vectorization change is in - final double value = Vect.avgShortAcc(address, frameRowCount, countsAddr + (long) workerId * Misc.CACHE_LINE_SIZE);
+ final long value = Vect.sumShort(address, frameRowCount);Here, |
|
Okay, fair enough; there is some vectorization, but it's far more than that. |
How about "perf(sql): vectorize avg(short) and improve sum() functions memory usage"? I've changed the title to this one. |
|
@ideoma thanks for the review! |
Long.MAX_VALUE/Short.MAX_VALUE= 281,483,566,907,400, so using long128 as the accumulator in Rostiavg(short)function was an overkill. Also, usinglonginstead ofdoubleas the accumulator in Javaavg(short)implementation is also beneficial due to no short-to-double conversion and faster addition instructions.Also, includes the following:
sum()aggregate functions.SumShortGroupByFunctionfunction. It's more efficient thanSumLongGroupByFunctionthat was previously used for short and byte values since it has no null checks.Hot execution times with ClickBench dataset on Ryzen 7900x running Ubuntu 24.04: