Skip to content

fix(sql): prevent SIGSEGV in window join by awaiting workers before freeing cache#6662

Merged
bluestreak01 merged 6 commits intomasterfrom
vi_fix_wj
Jan 19, 2026
Merged

fix(sql): prevent SIGSEGV in window join by awaiting workers before freeing cache#6662
bluestreak01 merged 6 commits intomasterfrom
vi_fix_wj

Conversation

@bluestreak01
Copy link
Copy Markdown
Member

Summary

Fixes #6661 - SIGSEGV crash when using WINDOW JOIN with large slave tables.

Root cause: AsyncWindowJoinRecordCursor.close() was freeing slaveTimeFrameAddressCache before waiting for worker threads to finish. When workers tried to access the freed DirectLongList (whose internal address becomes 0), they would read from address + offset where address = 0, causing SIGSEGV.

Fix:

  • Reorder operations in close() to await workers before freeing shared resources
  • Wrap cleanup in try-finally to ensure resources are always freed even if an exception occurs
  • Also fix a minor issue in WindowJoinPrevailingCache where rowIndex < 0 check handles Long.MIN_VALUE

Test plan

  • Added regression test testWindowJoinWithPrevailingOnEmptyResultSetRegression
  • All 180 WindowJoinTest tests pass (110 executed, 70 skipped due to parameterization)

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 18, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bluestreak01 bluestreak01 added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution labels Jan 18, 2026
@puzpuzpuz
Copy link
Copy Markdown
Contributor

Root cause: AsyncWindowJoinRecordCursor.close() was freeing slaveTimeFrameAddressCache before waiting for worker threads to finish.

That's not the only scenario when this bug would show up and the added test demonstrates that. While the AsyncWindowJoinRecordCursor bugfix is good there is an inefficiency in frameSequence.await() calls that are made by all async record cursors. The problem is that when closing a cursor we should skip all tasks that belong to the frame sequence under close instead of reducing them. Instead, we seem to reduce the published tasks which is a waste of CPU time and only increases query latency.

That's why the segfault happened reliably when LIMIT n clause was added - query owner thread (if not one of the worker threads) was guaranteed to reduce the remaining tasks.

I'll create a GH issue for this performance enhancement.

@puzpuzpuz
Copy link
Copy Markdown
Contributor

I'll create a GH issue for this performance enhancement.

GH issue: #6665

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 84 / 84 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/table/AsyncFilteredNegativeLimitRecordCursor.java 17 17 100.00%
🔵 io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java 22 22 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursor.java 10 10 100.00%
🔵 io/questdb/griffin/engine/table/AsyncTopKRecordCursor.java 9 9 100.00%
🔵 io/questdb/griffin/engine/join/AsyncWindowJoinRecordCursor.java 15 15 100.00%
🔵 io/questdb/griffin/engine/join/WindowJoinPrevailingCache.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java 10 10 100.00%

@bluestreak01 bluestreak01 merged commit a1ccdbb into master Jan 19, 2026
43 checks passed
@bluestreak01 bluestreak01 deleted the vi_fix_wj branch January 19, 2026 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SIGSEGV when using window join

3 participants