test(core): run concurrent reader operation in WAL writer fuzz tests by puzpuzpuz · Pull Request #6524 · questdb/questdb

puzpuzpuz · 2025-12-11T13:34:30Z

Adds concurrent reader operation to WAL writer fuzz tests to verify that the column files from intermediate txns are not corrupted.

coderabbitai · 2025-12-11T13:34:41Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces a rollback mechanism for column version tracking and extends the fuzz testing framework with query operations. Core changes include adding a rollback() method to ColumnVersionWriter, updating TableWriter to validate and reconcile column version state with transaction files during initialization, and introducing a new FuzzQueryOperation class and associated queryProb parameter for query-based fuzzing in the testing suite.

Changes

Cohort / File(s)	Summary
Column Version Recovery `core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java`, `core/src/main/java/io/questdb/cairo/TableWriter.java`	Added `rollback()` method to `ColumnVersionWriter` that decrements version and reloads state. Updated `TableWriter` constructor to validate column version against transaction file state, with automatic rollback if off-by-one or hard failure on unrecoverable mismatch. Updated `TxWriter` method calls to pass `denseSymbolMapWriters` parameter.
Fuzz Query Operations `core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java`	New class implementing `FuzzTransactionOperation` that executes SQL queries on tables with specified limits, handles concurrency failures and missing table conditions gracefully.
Fuzz Framework Configuration `core/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java`, `core/src/test/java/io/questdb/test/cairo/fuzz/AbstractFuzzTest.java`, `core/src/test/java/io/questdb/test/fuzz/FuzzTransactionGenerator.java`	Added `queryProb` parameter to fuzz probability configuration, introduced overloaded `setFuzzProbabilities` methods, and extended `generateSet` to include query operation generation logic.
Fuzz Test Cases `core/src/test/java/io/questdb/test/cairo/fuzz/O3MaxLagFuzzTest.java`, `core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java`, `core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java`	Updated `FuzzTransactionGenerator.generateSet` calls to include `queryProb` argument (0.0 or 0.01 depending on test), shifting subsequent parameters accordingly.
Fuzz Test Infrastructure `core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java`	Prefixed failure messages with `[failure-facade]` tag for improved error traceability without altering exception semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

TableWriter constructor validation logic: Verify state reconciliation between column version and transaction file, ensure rollback and failure conditions are correct.
FuzzQueryOperation implementation: Confirm exception handling strategy, ensure concurrent failure detection via FailureFileFacade is sound.
FuzzTransactionGenerator changes: Verify probability normalization includes new queryProb parameter and that query operation generation integrates correctly without disrupting existing operations.
Cross-file consistency: Ensure all test file updates correctly pass the new queryProb argument with appropriate defaults (0.0 or 0.01).

Possibly related PRs

fix(core): fix failure handling on column renames #5752: Related to TableWriter construction changes and TxWriter initialization/transaction handling with recovery and rollback logic.

Suggested reviewers

ideoma

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.27% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding concurrent reader operation to WAL writer fuzz tests.
Description check	✅ Passed	The description is related to the changeset, explaining the purpose of adding concurrent reader operations to verify column file integrity.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

core/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java

…ent_fuzz_reads

puzpuzpuz · 2025-12-13T07:19:37Z

@ideoma ColumnVersionWriter#rollback() left version in inconsistent state. I fixed that here: 82566f7. PTAL

puzpuzpuz · 2025-12-13T07:20:00Z

@coderabbitai review

coderabbitai · 2025-12-13T07:20:05Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java (1)

90-90: Consider documenting the return value semantics.

The method always returns false. While this appears intentional (read-only operations don't "succeed" in a way that affects transaction state), a brief comment explaining the return value semantics would improve maintainability.
         if (th instanceof TableReferenceOutOfDateException) {
             return false;
         }
         throw th;
     }
-    return false;
+    return false; // read-only operation, no state change to report
 }

core/src/main/java/io/questdb/cairo/TableWriter.java (1)

442-457: Avoid version - 1 underflow and make mismatch logic clearer
The intended case is “_cv is one step ahead of _txn” (cv == txn + 1). Using subtraction is a tiny correctness risk and less readable; also cache values to avoid repeated reads.

-            if (columnVersionWriter.getVersion() != txWriter.getColumnVersion()) {
-                if (columnVersionWriter.getVersion() - 1 == txWriter.getColumnVersion()) {
+            final long cvVersion = columnVersionWriter.getVersion();
+            final long txnCvVersion = txWriter.getColumnVersion();
+            if (cvVersion != txnCvVersion) {
+                if (cvVersion == txnCvVersion + 1) {
                     // This is case when transaction was aborted during column version change
                     // we need to use previous version of _cv file, which should still be available for reading
                     LOG.info().$("rolling back column version change to match _txn file [table=").$(tableToken)
-                            .$(", txnVersion=").$(txWriter.getColumnVersion())
-                            .$(", actualFileVersion=").$(columnVersionWriter.getVersion())
+                            .$(", txnVersion=").$(txnCvVersion)
+                            .$(", actualFileVersion=").$(cvVersion)
                             .I$();
                     columnVersionWriter.rollback();
                 } else {
                     throw CairoException.critical(0).put("Unrecoverable storage corruption detected: column version mismatch [table=").put(tableToken)
-                            .put(", txnVersion=").put(txWriter.getColumnVersion())
-                            .put(", actualFileVersion=").put(columnVersionWriter.getVersion())
+                            .put(", txnVersion=").put(txnCvVersion)
+                            .put(", actualFileVersion=").put(cvVersion)
                             .put(']');
                 }
             }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3137b86 and 7774928.

📒 Files selected for processing (10)

core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java (3 hunks)
core/src/main/java/io/questdb/cairo/TableWriter.java (2 hunks)
core/src/test/java/io/questdb/test/cairo/fuzz/AbstractFuzzTest.java (2 hunks)
core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java (2 hunks)
core/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java (8 hunks)
core/src/test/java/io/questdb/test/cairo/fuzz/O3MaxLagFuzzTest.java (1 hunks)
core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java (1 hunks)
core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java (15 hunks)
core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java (1 hunks)
core/src/test/java/io/questdb/test/fuzz/FuzzTransactionGenerator.java (4 hunks)

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-11-07T00:59:31.522Z

Learnt from: bluestreak01
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-07T00:59:31.522Z
Learning: In QuestDB's Cairo engine, transaction (_txn) files have a strong invariant: they are never truncated below TX_BASE_HEADER_SIZE. Once created, they are either fully formed (size >= header size) or completely removed along with the entire table directory when the table is dropped.

Applied to files:

core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java
core/src/main/java/io/questdb/cairo/TableWriter.java

📚 Learning: 2025-11-19T12:21:00.062Z

Learnt from: jerrinot
Repo: questdb/questdb PR: 6413
File: core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:11982-12002
Timestamp: 2025-11-19T12:21:00.062Z
Learning: QuestDB Java tests use a deterministic random seed. The test utilities (e.g., io.questdb.test.tools.TestUtils and io.questdb.std.Rnd) produce reproducible sequences, so rnd_* functions (including rnd_uuid4) yield deterministic outputs across runs. Do not flag tests in core/src/test/** that assert against values produced by rnd_* as flaky due to randomness.

Applied to files:

core/src/test/java/io/questdb/test/fuzz/FuzzTransactionGenerator.java
core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java
core/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java

📚 Learning: 2025-11-07T00:59:31.522Z

Learnt from: bluestreak01
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-07T00:59:31.522Z
Learning: When checking for transaction file validity in QuestDB, use `ff.length(path) >= TX_BASE_HEADER_SIZE` instead of `ff.exists(path)`. The length check provides stronger guarantees under extreme system load by ensuring the file system catalog is synchronized and the file is fully formed, preventing SIGBUS errors when memory mapping the file.

Applied to files:

core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java

🧬 Code graph analysis (3)

core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java (9)

core/src/main/java/io/questdb/cairo/CairoException.java (1)

CairoException (39-429)

core/src/main/java/io/questdb/cairo/TableToken.java (1)

TableToken (38-192)

core/src/main/java/io/questdb/cairo/sql/TableReferenceOutOfDateException.java (1)

TableReferenceOutOfDateException (33-93)

core/src/main/java/io/questdb/griffin/SqlException.java (1)

SqlException (37-268)

core/src/main/java/io/questdb/griffin/SqlExecutionContextImpl.java (1)

SqlExecutionContextImpl (59-499)

core/src/main/java/io/questdb/std/LongList.java (1)

LongList (35-636)

core/src/main/java/io/questdb/std/Rnd.java (1)

Rnd (39-491)

core/src/main/java/io/questdb/std/str/StringSink.java (1)

StringSink (31-194)

core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java (1)

FailureFileFacade (39-559)

core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java (2)

core/src/main/java/io/questdb/cairo/CairoException.java (1)

CairoException (39-429)

core/src/main/java/io/questdb/std/Os.java (1)

Os (41-357)

core/src/main/java/io/questdb/cairo/TableWriter.java (1)

core/src/main/java/io/questdb/cairo/CairoException.java (1)

CairoException (39-429)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)

GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
GitHub Check: New pull request (Hosted Running tests on windows-other-2)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-other-1)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
GitHub Check: New pull request (Check Changes Check changes)
GitHub Check: build
GitHub Check: build

🔇 Additional comments (12)

core/src/test/java/io/questdb/test/cairo/fuzz/AbstractFuzzTest.java (1)

267-292: LGTM!

The new setFuzzProbabilities overload correctly propagates the queryProb parameter to the fuzzer, maintaining consistency with the existing parameter ordering. The original overload without queryProb provides backward compatibility for tests that don't require query fuzzing.

core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java (1)

178-182: LGTM!

Setting probabilityOfQuery to 0.0 correctly disables query operations for this parquet partition fuzz test, which focuses on data integrity validation rather than concurrent read testing.

core/src/test/java/io/questdb/test/cairo/fuzz/O3MaxLagFuzzTest.java (1)

164-168: LGTM!

The addition of 0.0 for probabilityOfQuery correctly adapts this test to the new generateSet signature while keeping query fuzzing disabled, which is appropriate for this O3 max lag focused test.

core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java (1)

118-120: LGTM!

The updated setFuzzProbabilities calls correctly add the new queryProb parameter. Using 0.01 (1% probability) for query operations provides good coverage for concurrent read testing without overwhelming the fuzz tests. This pattern is consistently applied across all test methods.

core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java (1)

274-274: LGTM!

The [failure-facade] prefix improves debuggability by clearly identifying that these errors originate from the simulated failure mechanism rather than actual I/O failures.

core/src/test/java/io/questdb/test/fuzz/FuzzTransactionGenerator.java (1)

210-216: LGTM!

The query operation generation follows the established pattern. The limit calculation (±(1 to 1000)) correctly tests both positive (head) and negative (tail) LIMIT clauses. Queries appropriately don't increment metaVersion or waitBarrierVersion since they're read-only operations.

core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java (1)

41-91: Well-structured concurrent read operation for fuzz testing.

The implementation correctly:

Tracks FailureFileFacade state to distinguish between simulated failures and real errors

Handles expected exceptions during concurrent modifications (TableReferenceOutOfDateException, table-does-not-exist cases)

Uses try-with-resources for proper SqlExecutionContext cleanup

The query format <tableName> LIMIT <limit> will effectively read column data to verify intermediate transaction files aren't corrupted, fulfilling the PR objective.

core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java (1)

165-174: Implementation is correct; edge cases are guarded by calling code.

The rollback() method correctly decrements the version and reloads state from the previous data area via the double-write (A/B) scheme. The edge case concern—where version - 1 could be 0 or negative—is protected by the guard in TableWriter: rollback() is only called when columnVersionWriter.getVersion() - 1 == txWriter.getColumnVersion(), which ensures version >= 1 before decrement. If version were 0, the condition would fail and a critical error would be thrown instead.

core/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java (2)

110-110: LGTM - queryProb integration is well-structured.

The new queryProb field follows the established pattern for probability configuration in fuzz tests. The backward-compatible overload ensures existing call sites continue to work while new tests can specify query probability. The value is properly threaded through to FuzzTransactionGenerator.generateSet on line 496, which aligns with the PR's goal of adding concurrent reader operations to WAL writer fuzz tests.

Also applies to: 496-496, 609-667

149-156: Good practice - final keywords improve code clarity.

Making these local variables final prevents accidental reassignment and clearly communicates intent, especially helpful in this multi-threaded context where these collections are shared across threads.

core/src/main/java/io/questdb/cairo/TableWriter.java (2)

442-457: Add a targeted test to exercise the ctor rollback branch (coverage currently 0% for new lines)
Right now this logic is likely untested (and PR notes show coverage gaps). A small regression test that creates a table dir with _cv at txn+1 and _txn at txn (or mocks via existing test harness helpers) would lock in the intended recovery behavior and prevent future regressions.

3902-3906: No changes needed—API refactoring is complete and correct.

All call sites to txWriter.bumpMetadataAndColumnStructureVersion() and txWriter.bumpMetadataVersion() have been updated to pass the denseSymbolMapWriters parameter. The parameter is initialized once in the TableWriter constructor and is never null. No old no-arg signatures remain for direct TxWriter calls.

glasstiger · 2025-12-13T23:25:51Z

[PR Coverage check]

😞 fail : 4 / 18 (22.22%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/cairo/ColumnVersionWriter.java	1	5	20.00%
🔵	io/questdb/cairo/TableWriter.java	3	13	23.08%

…6524)

chore(test): run concurrent reader thread in WAL writer fuzz tests

4713bf9

puzpuzpuz self-assigned this Dec 11, 2025

puzpuzpuz added the Core Related to storage, data type, etc. label Dec 11, 2025

puzpuzpuz changed the title ~~chore(test): run concurrent reader thread in WAL writer fuzz tests~~ test(core): run concurrent reader thread in WAL writer fuzz tests Dec 11, 2025

ideoma reviewed Dec 11, 2025

View reviewed changes

core/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java Outdated Show resolved Hide resolved

puzpuzpuz and others added 7 commits December 11, 2025 18:20

Add tags to failure facade error messages

2306ca6

Switch to new fuzz operation instead of reader thread

140da3f

Newline

0ec2174

Freeze problematic seeds

2273e7a

fix fuzz rename column failure

ad8ecd9

Remove hardcoded seeds

5811741

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_concurr…

2c6f43c

…ent_fuzz_reads

puzpuzpuz changed the title ~~test(core): run concurrent reader thread in WAL writer fuzz tests~~ test(core): run concurrent reader operation in WAL writer fuzz tests Dec 12, 2025

Fix inconsistent version field after ColumnVersionWriter#rollback()

82566f7

Merge branch 'master' into puzpuzpuz_concurrent_fuzz_reads

7774928

coderabbitai bot reviewed Dec 13, 2025

View reviewed changes

Merge branch 'master' into puzpuzpuz_concurrent_fuzz_reads

71970ee

bluestreak01 approved these changes Dec 13, 2025

View reviewed changes

bluestreak01 merged commit 00b9115 into master Dec 13, 2025
42 of 43 checks passed

bluestreak01 deleted the puzpuzpuz_concurrent_fuzz_reads branch December 13, 2025 23:37

mtopolnik pushed a commit that referenced this pull request Dec 16, 2025

test(core): run concurrent reader operation in WAL writer fuzz tests (#…

1af3b5b

…6524)

coderabbitai bot mentioned this pull request Mar 18, 2026

feat(core): support column add/drop with parquet partition storage #6875

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(core): run concurrent reader operation in WAL writer fuzz tests#6524

test(core): run concurrent reader operation in WAL writer fuzz tests#6524
bluestreak01 merged 11 commits intomasterfrom
puzpuzpuz_concurrent_fuzz_reads

puzpuzpuz commented Dec 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 11, 2025 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

puzpuzpuz commented Dec 13, 2025

Uh oh!

puzpuzpuz commented Dec 13, 2025

Uh oh!

coderabbitai bot commented Dec 13, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

glasstiger commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

puzpuzpuz commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

puzpuzpuz commented Dec 13, 2025

Uh oh!

puzpuzpuz commented Dec 13, 2025

Uh oh!

coderabbitai bot commented Dec 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

glasstiger commented Dec 13, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

puzpuzpuz commented Dec 11, 2025 •

edited

Loading

coderabbitai bot commented Dec 11, 2025 •

edited

Loading