test(ilp): fix race condition in testSymbolCapacityReloadFuzz()#6389
test(ilp): fix race condition in testSymbolCapacityReloadFuzz()#6389bluestreak01 merged 2 commits intomasterfrom
Conversation
Easy to reproduce spurious failure. It's sufficient to inject
randomized pauses to Sender threads:
```java
for (int i = 0; i < N; i++) {
sender1.table("mytable")
.symbol("sym1", rnd1.nextString(10))
.symbol("sym2", rnd1.nextString(2))
.doubleColumn("dd", rnd1.nextDouble())
.atNow();
if (i % 1000 == 0) {
Os.sleep(ThreadLocalRandom.current().nextInt(10));
}
}
sender1.flush();
```
With this extra sleep the test fails consistently on my local.
Root cause: The test uses ILP@TCP - so sender.flush() only guarantees
ILP lines are successfully written to TCP send buffer. There can be
an arbitrary long pause between the flush and when the just
flushed lines are observed by TCP receiver, let alone WAL apply job.
The fix acknowledges this delay and relaxes the assertion. It also
runs the WalApply job even after the main WAL Apply thread is done,
becasue the WAL Apply Thread might finish before it had a chance
to observe all lines.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughA test in LineTcpSenderTest.java was refactored to replace explicit latch and pool listener-based synchronization with an assertEventually pattern. The testSymbolCapacityReloadFuzz method now uses eventual assertion to validate table state after draining the WAL queue instead of waiting for pool return events. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review pls |
|
@jerrinot: I'll review the changes in this PR. ✅ Actions performedReview triggered.
|
Easy to reproduce spurious failure. It's sufficient to inject randomized pauses to Sender threads:
With this extra sleep the test fails consistently on my local.
Root cause: The test uses ILP@TCP - so sender.flush() only guarantees ILP lines are successfully written to TCP send buffer. There can be an arbitrary long pause between the flush and when the just flushed lines are observed by TCP receiver, let alone WAL apply job.
The fix acknowledges this delay and relaxes the assertion. It also runs the WalApply job even after the main WAL Apply thread is done, becasue the WAL Apply Thread might finish before it had a chance to observe all lines.