fix(core): fix table reading timeout error after non-wal table dropped and re-created by ideoma · Pull Request #6095 · questdb/questdb

ideoma · 2025-09-01T15:58:01Z

Found by fuzz tests.
When a non-WAL table is dropped and re-created, purge jobs can push max txn using TxnScoreboardPoolV2.isRangeAvailable() and it can lead to a timeout on opening TableReader, where it cannot lock the latest transaction in the scoreboard.

The fix is not to modify the max in TxnScoreboardPoolV2.isRangeAvailable(), making it a read-only scoreboard operation.

…d and re-created

coderabbitai · 2025-09-01T15:58:08Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-reader-open-timeout-on-scoreboard-lock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

core/src/main/java/io/questdb/cairo/TxnScoreboardV2.java

…r instead

core/src/main/java/io/questdb/cairo/CairoEngine.java

core/src/main/java/io/questdb/cairo/ColumnPurgeOperator.java

puzpuzpuz · 2025-09-02T07:39:38Z

When a non-WAL table is dropped and re-created, purge jobs can push max txn using TxnScoreboardPoolV2.isRangeAvailable() and it can lead to a timeout on opening TableReader, where it cannot lock the latest transaction in the scoreboard.

Why exactly table readers time out? Aren't they supposed to spin with reading the latest txn and then trying to acquire it? Or somehow we end up with the max txn being incremented many times in this scenario?

ideoma · 2025-09-02T10:35:50Z

Why exactly table readers time out? Aren't they supposed to spin with reading the latest txn and then trying to acquire it? Or somehow we end up with the max txn being incremented many times in this scenario?

Purge job calls isRangeAvailable(0,13) because it tries to clean up dropped table with the same dir name (non-wal table), this pushes max txn to 13 in the score board. Readers then timeout because the spin to acquire txn 0 read from the _txn file and they cannot, because max is 13.

puzpuzpuz · 2025-09-02T13:00:30Z

because it tries to clean up dropped table with the same dir name (non-wal table)

Gotcha, that's the culprit!

glasstiger · 2025-09-02T14:23:37Z

[PR Coverage check]

😍 pass : 20 / 22 (90.91%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/tasks/ColumnPurgeTask.java	5	6	83.33%
🔵	io/questdb/cairo/ColumnPurgeOperator.java	10	11	90.91%
🔵	io/questdb/cairo/VacuumColumnVersions.java	1	1	100.00%
🔵	io/questdb/griffin/PurgingOperator.java	2	2	100.00%
🔵	io/questdb/cairo/CairoEngine.java	1	1	100.00%
🔵	io/questdb/cairo/ColumnPurgeJob.java	1	1	100.00%

fix(core): fix table reading timeout error after non-wal table droppe…

5b3bdae

…d and re-created

puzpuzpuz reviewed Sep 1, 2025

View reviewed changes

core/src/main/java/io/questdb/cairo/TxnScoreboardV2.java Outdated Show resolved Hide resolved

ideoma added 2 commits September 1, 2025 20:09

fix test, add comments

73159a6

change the fix, revert scoreboard change and fix column purge operato…

79732de

…r instead

puzpuzpuz reviewed Sep 2, 2025

View reviewed changes

core/src/main/java/io/questdb/cairo/CairoEngine.java Outdated Show resolved Hide resolved

puzpuzpuz reviewed Sep 2, 2025

View reviewed changes

core/src/main/java/io/questdb/cairo/ColumnPurgeOperator.java Outdated Show resolved Hide resolved

puzpuzpuz added Bug Incorrect or unexpected behavior Core Related to storage, data type, etc. labels Sep 2, 2025

puzpuzpuz previously approved these changes Sep 2, 2025

View reviewed changes

ideoma added 2 commits September 2, 2025 14:34

rename tableName -> tableToken

aaeb65a

resolve nits from the review

df69ff5

ideoma dismissed puzpuzpuz’s stale review via df69ff5 September 2, 2025 13:49

puzpuzpuz self-requested a review September 2, 2025 15:34

puzpuzpuz approved these changes Sep 2, 2025

View reviewed changes

ideoma merged commit e3c705c into master Sep 2, 2025
35 checks passed

ideoma deleted the fix-reader-open-timeout-on-scoreboard-lock branch September 2, 2025 16:57

ideoma mentioned this pull request Sep 3, 2025

fix(core): fix failure handling on column renames #5752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): fix table reading timeout error after non-wal table dropped and re-created#6095

fix(core): fix table reading timeout error after non-wal table dropped and re-created#6095
ideoma merged 5 commits intomasterfrom
fix-reader-open-timeout-on-scoreboard-lock

ideoma commented Sep 1, 2025

Uh oh!

coderabbitai bot commented Sep 1, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puzpuzpuz commented Sep 2, 2025

Uh oh!

ideoma commented Sep 2, 2025

Uh oh!

puzpuzpuz commented Sep 2, 2025

Uh oh!

glasstiger commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ideoma commented Sep 1, 2025

Uh oh!

coderabbitai bot commented Sep 1, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puzpuzpuz commented Sep 2, 2025

Uh oh!

ideoma commented Sep 2, 2025

Uh oh!

puzpuzpuz commented Sep 2, 2025

Uh oh!

glasstiger commented Sep 2, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants