test(core): run concurrent reader operation in WAL writer fuzz tests#6524
test(core): run concurrent reader operation in WAL writer fuzz tests#6524bluestreak01 merged 11 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces a rollback mechanism for column version tracking and extends the fuzz testing framework with query operations. Core changes include adding a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java (1)
90-90: Consider documenting the return value semantics.The method always returns
false. While this appears intentional (read-only operations don't "succeed" in a way that affects transaction state), a brief comment explaining the return value semantics would improve maintainability.if (th instanceof TableReferenceOutOfDateException) { return false; } throw th; } - return false; + return false; // read-only operation, no state change to report }core/src/main/java/io/questdb/cairo/TableWriter.java (1)
442-457: Avoidversion - 1underflow and make mismatch logic clearer
The intended case is “_cv is one step ahead of _txn” (cv == txn + 1). Using subtraction is a tiny correctness risk and less readable; also cache values to avoid repeated reads.- if (columnVersionWriter.getVersion() != txWriter.getColumnVersion()) { - if (columnVersionWriter.getVersion() - 1 == txWriter.getColumnVersion()) { + final long cvVersion = columnVersionWriter.getVersion(); + final long txnCvVersion = txWriter.getColumnVersion(); + if (cvVersion != txnCvVersion) { + if (cvVersion == txnCvVersion + 1) { // This is case when transaction was aborted during column version change // we need to use previous version of _cv file, which should still be available for reading LOG.info().$("rolling back column version change to match _txn file [table=").$(tableToken) - .$(", txnVersion=").$(txWriter.getColumnVersion()) - .$(", actualFileVersion=").$(columnVersionWriter.getVersion()) + .$(", txnVersion=").$(txnCvVersion) + .$(", actualFileVersion=").$(cvVersion) .I$(); columnVersionWriter.rollback(); } else { throw CairoException.critical(0).put("Unrecoverable storage corruption detected: column version mismatch [table=").put(tableToken) - .put(", txnVersion=").put(txWriter.getColumnVersion()) - .put(", actualFileVersion=").put(columnVersionWriter.getVersion()) + .put(", txnVersion=").put(txnCvVersion) + .put(", actualFileVersion=").put(cvVersion) .put(']'); } }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java(3 hunks)core/src/main/java/io/questdb/cairo/TableWriter.java(2 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/AbstractFuzzTest.java(2 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java(2 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java(8 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/O3MaxLagFuzzTest.java(1 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java(1 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java(15 hunks)core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java(1 hunks)core/src/test/java/io/questdb/test/fuzz/FuzzTransactionGenerator.java(4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-07T00:59:31.522Z
Learnt from: bluestreak01
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-07T00:59:31.522Z
Learning: In QuestDB's Cairo engine, transaction (_txn) files have a strong invariant: they are never truncated below TX_BASE_HEADER_SIZE. Once created, they are either fully formed (size >= header size) or completely removed along with the entire table directory when the table is dropped.
Applied to files:
core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.javacore/src/main/java/io/questdb/cairo/TableWriter.java
📚 Learning: 2025-11-19T12:21:00.062Z
Learnt from: jerrinot
Repo: questdb/questdb PR: 6413
File: core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:11982-12002
Timestamp: 2025-11-19T12:21:00.062Z
Learning: QuestDB Java tests use a deterministic random seed. The test utilities (e.g., io.questdb.test.tools.TestUtils and io.questdb.std.Rnd) produce reproducible sequences, so rnd_* functions (including rnd_uuid4) yield deterministic outputs across runs. Do not flag tests in core/src/test/** that assert against values produced by rnd_* as flaky due to randomness.
Applied to files:
core/src/test/java/io/questdb/test/fuzz/FuzzTransactionGenerator.javacore/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.javacore/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java
📚 Learning: 2025-11-07T00:59:31.522Z
Learnt from: bluestreak01
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-07T00:59:31.522Z
Learning: When checking for transaction file validity in QuestDB, use `ff.length(path) >= TX_BASE_HEADER_SIZE` instead of `ff.exists(path)`. The length check provides stronger guarantees under extreme system load by ensuring the file system catalog is synchronized and the file is fully formed, preventing SIGBUS errors when memory mapping the file.
Applied to files:
core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java
🧬 Code graph analysis (3)
core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java (9)
core/src/main/java/io/questdb/cairo/CairoException.java (1)
CairoException(39-429)core/src/main/java/io/questdb/cairo/TableToken.java (1)
TableToken(38-192)core/src/main/java/io/questdb/cairo/sql/TableReferenceOutOfDateException.java (1)
TableReferenceOutOfDateException(33-93)core/src/main/java/io/questdb/griffin/SqlException.java (1)
SqlException(37-268)core/src/main/java/io/questdb/griffin/SqlExecutionContextImpl.java (1)
SqlExecutionContextImpl(59-499)core/src/main/java/io/questdb/std/LongList.java (1)
LongList(35-636)core/src/main/java/io/questdb/std/Rnd.java (1)
Rnd(39-491)core/src/main/java/io/questdb/std/str/StringSink.java (1)
StringSink(31-194)core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java (1)
FailureFileFacade(39-559)
core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java (2)
core/src/main/java/io/questdb/cairo/CairoException.java (1)
CairoException(39-429)core/src/main/java/io/questdb/std/Os.java (1)
Os(41-357)
core/src/main/java/io/questdb/cairo/TableWriter.java (1)
core/src/main/java/io/questdb/cairo/CairoException.java (1)
CairoException(39-429)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (12)
core/src/test/java/io/questdb/test/cairo/fuzz/AbstractFuzzTest.java (1)
267-292: LGTM!The new
setFuzzProbabilitiesoverload correctly propagates thequeryProbparameter to the fuzzer, maintaining consistency with the existing parameter ordering. The original overload withoutqueryProbprovides backward compatibility for tests that don't require query fuzzing.core/src/test/java/io/questdb/test/cairo/fuzz/O3ParquetPartitionFuzzTest.java (1)
178-182: LGTM!Setting
probabilityOfQueryto0.0correctly disables query operations for this parquet partition fuzz test, which focuses on data integrity validation rather than concurrent read testing.core/src/test/java/io/questdb/test/cairo/fuzz/O3MaxLagFuzzTest.java (1)
164-168: LGTM!The addition of
0.0forprobabilityOfQuerycorrectly adapts this test to the newgenerateSetsignature while keeping query fuzzing disabled, which is appropriate for this O3 max lag focused test.core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java (1)
118-120: LGTM!The updated
setFuzzProbabilitiescalls correctly add the newqueryProbparameter. Using0.01(1% probability) for query operations provides good coverage for concurrent read testing without overwhelming the fuzz tests. This pattern is consistently applied across all test methods.core/src/test/java/io/questdb/test/cairo/fuzz/FailureFileFacade.java (1)
274-274: LGTM!The
[failure-facade]prefix improves debuggability by clearly identifying that these errors originate from the simulated failure mechanism rather than actual I/O failures.core/src/test/java/io/questdb/test/fuzz/FuzzTransactionGenerator.java (1)
210-216: LGTM!The query operation generation follows the established pattern. The limit calculation (
±(1 to 1000)) correctly tests both positive (head) and negative (tail) LIMIT clauses. Queries appropriately don't incrementmetaVersionorwaitBarrierVersionsince they're read-only operations.core/src/test/java/io/questdb/test/fuzz/FuzzQueryOperation.java (1)
41-91: Well-structured concurrent read operation for fuzz testing.The implementation correctly:
- Tracks
FailureFileFacadestate to distinguish between simulated failures and real errors- Handles expected exceptions during concurrent modifications (
TableReferenceOutOfDateException, table-does-not-exist cases)- Uses try-with-resources for proper
SqlExecutionContextcleanupThe query format
<tableName> LIMIT <limit>will effectively read column data to verify intermediate transaction files aren't corrupted, fulfilling the PR objective.core/src/main/java/io/questdb/cairo/ColumnVersionWriter.java (1)
165-174: Implementation is correct; edge cases are guarded by calling code.The
rollback()method correctly decrements the version and reloads state from the previous data area via the double-write (A/B) scheme. The edge case concern—whereversion - 1could be 0 or negative—is protected by the guard inTableWriter:rollback()is only called whencolumnVersionWriter.getVersion() - 1 == txWriter.getColumnVersion(), which ensures version >= 1 before decrement. If version were 0, the condition would fail and a critical error would be thrown instead.core/src/test/java/io/questdb/test/cairo/fuzz/FuzzRunner.java (2)
110-110: LGTM - queryProb integration is well-structured.The new
queryProbfield follows the established pattern for probability configuration in fuzz tests. The backward-compatible overload ensures existing call sites continue to work while new tests can specify query probability. The value is properly threaded through toFuzzTransactionGenerator.generateSeton line 496, which aligns with the PR's goal of adding concurrent reader operations to WAL writer fuzz tests.Also applies to: 496-496, 609-667
149-156: Good practice - final keywords improve code clarity.Making these local variables
finalprevents accidental reassignment and clearly communicates intent, especially helpful in this multi-threaded context where these collections are shared across threads.core/src/main/java/io/questdb/cairo/TableWriter.java (2)
442-457: Add a targeted test to exercise the ctor rollback branch (coverage currently 0% for new lines)
Right now this logic is likely untested (and PR notes show coverage gaps). A small regression test that creates a table dir with_cvattxn+1and_txnattxn(or mocks via existing test harness helpers) would lock in the intended recovery behavior and prevent future regressions.
3902-3906: No changes needed—API refactoring is complete and correct.All call sites to
txWriter.bumpMetadataAndColumnStructureVersion()andtxWriter.bumpMetadataVersion()have been updated to pass thedenseSymbolMapWritersparameter. The parameter is initialized once in the TableWriter constructor and is never null. No old no-arg signatures remain for direct TxWriter calls.
[PR Coverage check]😞 fail : 4 / 18 (22.22%) file detail
|
Adds concurrent reader operation to WAL writer fuzz tests to verify that the column files from intermediate txns are not corrupted.