Skip to content

perf(sql): reduce garbage generated on parallel query hot path#6597

Merged
bluestreak01 merged 25 commits intomasterfrom
puzpuzpuz_tweak_page_frame_formula
Jan 11, 2026
Merged

perf(sql): reduce garbage generated on parallel query hot path#6597
bluestreak01 merged 25 commits intomasterfrom
puzpuzpuz_tweak_page_frame_formula

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Jan 5, 2026

Reduces GC pressure and jitter in parallel queries on large multi-core machines (tested on c7a.metal-48xl with 192 vCPUs).

Key Changes

  1. Off-heap Page Frame Address Cache
  • PageFrameAddressCache now uses flat DirectLongList lists instead of nested LongList objects
  • Reduces ~20K object allocations per 1000 frames × 105 columns to just 4 contiguous off-heap lists
  • Memory footprint moves from on-heap (~3.4 MB for 1K page frames) to off-heap, eliminating GC pressure
  1. Lazy MapValue Flyweights
  • Decimal128, Decimal256, Long256 flyweights in map values are now lazily initialized
  • Avoids allocations when these types aren't used in queries. Decimal flyweight as relatively expensive since each of them allocates multiple int[] array.
  1. GroupByAllocator Improvements
  • LongLongHashMap is replaced with off-heap DirectLongLongHashMap. Previously, each query worker was growing the hash map when allocating leading to lots of on-heap allocations.
  1. Page Frame Size Calculation Improvements
  • Page frame sizes are now calculated to avoid tiny trailing frames within a partition. Minimal page frame size is also increased 2x for vectorized non-keyed GROUP BY since these tasks a more lightweight than filter or other GROUP BY ones.
  1. Work Stealing Strategy Tuning
  • New config: cairo.sql.parallel.work.stealing.spin.timeout (default 50µs)
  • Improvements to AdaptiveWorkStealingStrategy for better parallel execution. Namely, System.nanoTime() with configurable timeout is now used to avoid variable spinning time on different OSes and CPU architectures. Thread.onSpinWait() is used in the spin-wait loop to use PAUSE/YIELD instruction instead of "sleep(0)" system call.
  1. Proper Resource Cleanup
  • All classes using PageFrameAddressCache now call Misc.free(cursor) in close() methods
  1. Removed Config
  • Removed cairo.sql.jit.page.address.cache.threshold (no longer needed)

Benchmarks

I've run clickbench queries with 30 iterations per query on my Ryzen 7900x (12c/24t), 64GB RAM box running Ubuntu 24.04 and GraalVM CE 17.0.8.

GC Analysis Summary

Master Branch GC Metrics

  • Total runtime: ~623 seconds
  • GC events: 50+ (GC(0) through GC(50))
  • Final heap: ~1.15GB used of ~1.18GB allocated
  • GC types: Young (32), Mixed (6), Full (2), Concurrent Mark cycles (10+)

Patch Branch GC Metrics

  • Total runtime: ~739 seconds
  • GC events: 20 (GC(0) through GC(20))
  • Final heap: ~330MB used of ~532MB allocated
  • GC types: Young (14), Full (2), Concurrent Mark cycles (4)

The runtimes is different since I didn't stop the server immediately after the benchmark, so the server was idle for some time. But this shouldn't impact the end result since runtime for patch was longer.

GC Comparison

Metric Master Patch Difference
GC events (after warm-up) ~45 ~15 -67% fewer GCs
Final heap used 780 MB 330 MB -57% less memory
Heap allocated 1180 MB 532 MB -55% smaller heap

Key GC Pause Times (excluding startup Full GCs):

Type Master (sum) Patch (sum)
Young GC pauses ~95ms ~35ms
Remark pauses ~15ms ~8ms
Mixed GC pauses ~12ms 0ms

The patch shows a significant reduction in allocations - approximately 2.5x fewer GC cycles during benchmark execution, indicating the optimization reduces memory pressure substantially.

Query Time Analysis

Hot runs comparison according to clickbench rules (min of iterations 2-3, +10ms offset)

Query Master Min+10ms Patch Min+10ms Δ vs Master
Q0 0.010 0.010 0%
Q1 0.015 0.016 +3%
Q2 0.018 0.019 +3%
Q3 0.026 0.025 -4%
Q4 0.351 0.327 -7%
Q5 0.230 0.230 0%
Q6 0.011 0.011 0%
Q7 0.019 0.021 +8%
Q8 0.573 0.574 0%
Q9 0.657 0.648 -1%
Q10 0.102 0.093 -9%
Q11 0.077 0.077 0%
Q12 0.237 0.236 0%
Q13 0.359 0.356 -1%
Q14 0.267 0.278 +4%
Q15 0.404 0.403 0%
Q16 0.804 0.815 +1%
Q17 0.794 0.798 +1%
Q18 1.303 1.317 +1%
Q19 0.026 0.027 +4%
Q20 0.213 0.213 0%
Q21 0.219 0.223 +2%
Q22 0.214 0.215 0%
Q23 0.021 0.021 0%
Q24 0.012 0.012 0%
Q25 0.079 0.077 -3%
Q26 0.012 0.012 0%
Q27 0.293 0.288 -2%
Q28 1.956 2.096 +7%
Q29 0.015 0.016 +3%
Q30 0.187 0.190 +1%
Q31 0.238 0.238 0%
Q32 2.137 2.137 0%
Q33 1.199 1.209 +1%
Q34 1.169 1.199 +3%
Q35 0.320 0.312 -3%
Q36 0.054 0.036 -33%
Q37 0.030 0.030 -2%
Q38 0.028 0.027 -4%
Q39 0.075 0.074 -1%
Q40 0.035 0.033 -6%
Q41 0.034 0.033 -3%
Q42 0.016 0.016 0%

Note: Q28 is also tough for JVM JIT since it uses regexp_replace() SQL function, thus standard regexp library. So, the difference in that query is likely due to JVM JIT's jitter.

@puzpuzpuz puzpuzpuz self-assigned this Jan 5, 2026
@puzpuzpuz puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Jan 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 5, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@puzpuzpuz puzpuzpuz added the DO NOT MERGE These changes should not be merged to main branch label Jan 8, 2026
@puzpuzpuz puzpuzpuz marked this pull request as ready for review January 8, 2026 09:10
@puzpuzpuz puzpuzpuz changed the title perf(sql): avoid very small tasks published to worker threads in parallel queries perf(sql): reduce garbage generated on parallel query hot path Jan 8, 2026
@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_tweak_page_frame_formula branch from ccc6063 to 4ad275b Compare January 8, 2026 10:46
@puzpuzpuz puzpuzpuz removed the DO NOT MERGE These changes should not be merged to main branch label Jan 8, 2026
@puzpuzpuz puzpuzpuz added DO NOT MERGE These changes should not be merged to main branch and removed DO NOT MERGE These changes should not be merged to main branch labels Jan 8, 2026
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 533 / 598 (89.13%)

file detail

path covered line new line coverage
🔵 io/questdb/std/DirectIntIntHashMap.java 0 1 00.00%
🔵 io/questdb/std/DirectIntLongHashMap.java 0 2 00.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 0 1 00.00%
🔵 io/questdb/griffin/engine/table/AsyncFilteredRecordCursorFactory.java 0 2 00.00%
🔵 io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java 0 2 00.00%
🔵 io/questdb/griffin/engine/groupby/FlyweightCompactMapValue.java 0 6 00.00%
🔵 io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java 0 2 00.00%
🔵 io/questdb/griffin/engine/groupby/FlyweightMapValueImpl.java 0 12 00.00%
🔵 io/questdb/cairo/map/Unordered4MapValue.java 4 12 33.33%
🔵 io/questdb/cairo/map/Unordered8MapValue.java 4 12 33.33%
🔵 io/questdb/cairo/map/UnorderedVarcharMapValue.java 5 13 38.46%
🔵 io/questdb/griffin/engine/table/parquet/RowGroupBuffers.java 4 5 80.00%
🔵 io/questdb/griffin/engine/join/WindowJoinFastRecordCursorFactory.java 46 52 88.46%
🔵 io/questdb/griffin/engine/groupby/SampleByFillValueRecordCursorFactory.java 11 12 91.67%
🔵 io/questdb/griffin/engine/groupby/SampleByFillNullRecordCursorFactory.java 14 15 93.33%
🔵 io/questdb/std/DirectLongLongHashMap.java 126 129 97.67%
🔵 io/questdb/cairo/sql/PageFrameMemoryPool.java 52 53 98.11%
🔵 io/questdb/griffin/engine/join/WindowJoinRecordCursorFactory.java 4 4 100.00%
🔵 io/questdb/griffin/engine/table/AsyncFilteredNegativeLimitRecordCursor.java 1 1 100.00%
🔵 io/questdb/cairo/sql/async/PageFrameSequence.java 19 19 100.00%
🔵 io/questdb/std/Misc.java 5 5 100.00%
🔵 io/questdb/griffin/engine/groupby/SampleByFillValueNotKeyedRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java 1 1 100.00%
🔵 io/questdb/std/DirectIntList.java 2 2 100.00%
🔵 io/questdb/griffin/engine/groupby/vect/GroupByNotKeyedVectorRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/cairo/sql/async/WorkStealingStrategyFactory.java 2 2 100.00%
🔵 io/questdb/griffin/engine/table/FwdTableReaderPageFrameCursor.java 9 9 100.00%
🔵 io/questdb/cairo/sql/PageFrameMemoryRecord.java 54 54 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursor.java 3 3 100.00%
🔵 io/questdb/griffin/engine/join/AsyncWindowJoinAtom.java 14 14 100.00%
🔵 io/questdb/cairo/map/OrderedMapValue.java 12 12 100.00%
🔵 io/questdb/griffin/engine/groupby/SampleByFirstLastRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/vect/GroupByRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/griffin/engine/groupby/SampleByFillPrevNotKeyedRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/std/bytes/DirectByteSink.java 3 3 100.00%
🔵 io/questdb/griffin/engine/table/BwdTableReaderPageFrameCursor.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/FastGroupByAllocator.java 17 17 100.00%
🔵 io/questdb/griffin/SqlCodeGenerator.java 5 5 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%
🔵 io/questdb/cairo/map/UnorderedVarcharMap.java 3 3 100.00%
🔵 io/questdb/PropServerConfiguration.java 2 2 100.00%
🔵 io/questdb/cutlass/http/HttpHeaderParser.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/SampleByFillNullNotKeyedRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncTopKRecordCursor.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/SampleByInterpolateRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/json/JsonExtractSupportingState.java 2 2 100.00%
🔵 io/questdb/griffin/engine/join/AsyncWindowJoinRecordCursor.java 9 9 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByAtom.java 8 8 100.00%
🔵 io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/SampleByFillNoneNotKeyedRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/std/str/DirectUtf8StringList.java 4 4 100.00%
🔵 io/questdb/griffin/engine/table/AbstractPageFrameRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/cairo/sql/async/PageFrameReduceTask.java 1 1 100.00%
🔵 io/questdb/std/DirectLongList.java 8 8 100.00%
🔵 io/questdb/griffin/engine/table/TimeFrameCursorImpl.java 3 3 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java 1 1 100.00%
🔵 io/questdb/griffin/engine/groupby/AbstractNoRecordSampleByCursor.java 2 2 100.00%
🔵 io/questdb/std/str/DirectUtf8Sink.java 1 1 100.00%
🔵 io/questdb/cairo/sql/async/AdaptiveWorkStealingStrategy.java 10 10 100.00%
🔵 io/questdb/cairo/sql/PageFrameAddressCache.java 31 31 100.00%
🔵 io/questdb/griffin/engine/table/AbstractPageFrameRecordCursor.java 2 2 100.00%
🔵 io/questdb/griffin/engine/groupby/GroupByNotKeyedRecordCursorFactory.java 2 2 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java 8 8 100.00%

@bluestreak01 bluestreak01 merged commit 81437b1 into master Jan 11, 2026
49 checks passed
@bluestreak01 bluestreak01 deleted the puzpuzpuz_tweak_page_frame_formula branch January 11, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants