Skip to content

test(ilp): fix race condition in testSymbolCapacityReloadFuzz()#6389

Merged
bluestreak01 merged 2 commits intomasterfrom
jh_flaky_testSymbolCapacityReloadFuzz
Nov 13, 2025
Merged

test(ilp): fix race condition in testSymbolCapacityReloadFuzz()#6389
bluestreak01 merged 2 commits intomasterfrom
jh_flaky_testSymbolCapacityReloadFuzz

Conversation

@jerrinot
Copy link
Copy Markdown
Contributor

Easy to reproduce spurious failure. It's sufficient to inject randomized pauses to Sender threads:

for (int i = 0; i < N; i++) {
    sender1.table("mytable")
            .symbol("sym1", rnd1.nextString(10))
            .symbol("sym2", rnd1.nextString(2))
            .doubleColumn("dd", rnd1.nextDouble())
            .atNow();
    if (i % 1000 == 0) {
        Os.sleep(ThreadLocalRandom.current().nextInt(10));
    }
}
sender1.flush();

With this extra sleep the test fails consistently on my local.

Root cause: The test uses ILP@TCP - so sender.flush() only guarantees ILP lines are successfully written to TCP send buffer. There can be an arbitrary long pause between the flush and when the just flushed lines are observed by TCP receiver, let alone WAL apply job.

The fix acknowledges this delay and relaxes the assertion. It also runs the WalApply job even after the main WAL Apply thread is done, becasue the WAL Apply Thread might finish before it had a chance to observe all lines.

Easy to reproduce spurious failure. It's sufficient to inject
randomized pauses to Sender threads:
```java
for (int i = 0; i < N; i++) {
    sender1.table("mytable")
            .symbol("sym1", rnd1.nextString(10))
            .symbol("sym2", rnd1.nextString(2))
            .doubleColumn("dd", rnd1.nextDouble())
            .atNow();
    if (i % 1000 == 0) {
        Os.sleep(ThreadLocalRandom.current().nextInt(10));
    }
}
sender1.flush();
```
With this extra sleep the test fails consistently on my local.

Root cause: The test uses ILP@TCP - so sender.flush() only guarantees
ILP lines are successfully written to TCP send buffer. There can be
an arbitrary long pause between the flush and when the just
flushed lines are observed by TCP receiver, let alone WAL apply job.

The fix acknowledges this delay and relaxes the assertion. It also
runs the WalApply job even after the main WAL Apply thread is done,
becasue the WAL Apply Thread might finish before it had a chance
to observe all lines.
@jerrinot jerrinot added the Bug Incorrect or unexpected behavior label Nov 13, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 13, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

A test in LineTcpSenderTest.java was refactored to replace explicit latch and pool listener-based synchronization with an assertEventually pattern. The testSymbolCapacityReloadFuzz method now uses eventual assertion to validate table state after draining the WAL queue instead of waiting for pool return events.

Changes

Cohort / File(s) Summary
Test synchronization refactoring
core/src/test/java/io/questdb/test/cutlass/line/tcp/LineTcpSenderTest.java
Added import for TestUtils.assertEventually. Removed writerIsBack latch and pool listener tracking EV_RETURN events in testSymbolCapacityReloadFuzz. Replaced with eventual assertion that drains WAL queue and validates final table count via SQL assertion.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify that the assertEventually-based approach correctly replaces the latch/listener synchronization logic
  • Confirm the WAL queue draining and table count validation are equivalent to the previous behavior
  • Check that the timeout and polling behavior of assertEventually is appropriate for this test scenario

Suggested reviewers

  • puzpuzpuz
  • bluestreak01

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing a race condition in testSymbolCapacityReloadFuzz(), which directly matches the file modified and the core objective of the PR.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about the race condition, reproduction steps, root cause analysis, and the specific fixes applied to the test.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jerrinot
Copy link
Copy Markdown
Contributor Author

@coderabbitai review pls

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 13, 2025

@jerrinot: I'll review the changes in this PR.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@bluestreak01 bluestreak01 merged commit 813dc07 into master Nov 13, 2025
40 checks passed
@bluestreak01 bluestreak01 deleted the jh_flaky_testSymbolCapacityReloadFuzz branch November 13, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants