Skip to content

fix(core): fix server failing to accept connections under high-load#6334

Merged
bluestreak01 merged 19 commits intomasterfrom
jh_stranding_accept
Oct 31, 2025
Merged

fix(core): fix server failing to accept connections under high-load#6334
bluestreak01 merged 19 commits intomasterfrom
jh_stranding_accept

Conversation

@jerrinot
Copy link
Copy Markdown
Contributor

On busy Linux systems, when many clients try to open new connections simultaneously, the server could fail to accept some of them.

This manifests as client connection timeouts.

On busy Linux systems, when many clients try to open new connections
simultaneously, the server could fail to accept some of them.

This manifests as client connection timeouts.
@jerrinot jerrinot added the Bug Incorrect or unexpected behavior label Oct 30, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 30, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The PR adds JVM garbage collection flags to CI configurations, refactors IO dispatcher methods to return boolean status indicators instead of void, updates platform-specific dispatcher implementations to consume these return values for control flow decisions, and extends test infrastructure with accept loop timeout configuration and improved error handling.

Changes

Cohort / File(s) Summary
CI Configuration Updates
ci/templates/hosted-jobs.yml, ci/test-pipeline.yml
Added -XX:+UseParallelGC JVM flag to MAVEN_OPTS for improved garbage collection.
IO Dispatcher Core API
core/src/main/java/io/questdb/network/AbstractIODispatcher.java
Updated accept(long) and processDisconnects(long) method signatures from void to boolean return type to indicate processing status.
IO Dispatcher Platform Implementations
core/src/main/java/io/questdb/network/IODispatcherLinux.java, IODispatcherOsx.java, IODispatcherWindows.java
Adapted implementations to capture and utilize boolean return values from processDisconnects() and accept() for control flow decisions. Linux variant adds pendingAccept state flag for conditional processing logic.
Test Infrastructure - PG Configuration
core/src/test/java/io/questdb/test/cutlass/pgwire/BasePGTest.java, PGArraysTest.java, PGJobContextTest.java
Added acceptLoopTimeout field with randomized initialization to test configuration, propagating timeout settings to PG server instances.
Test Error Handling
core/src/test/java/io/questdb/test/TestListener.java
Enhanced error handling in thread dumper with null checks and try-catch blocks; now reports thread dumper failures with formatted error messages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Focus areas:
    • Verify that all call sites of accept() and processDisconnects() in each dispatcher implementation correctly handle the new boolean return values
    • Cross-check Linux implementation's pendingAccept state machine logic against the functional intent
    • Confirm that the return value propagation in error paths (Windows select() failure case) is intentional and correct
    • Validate test configuration changes don't introduce timing-dependent test flakiness

Suggested labels

Core

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "fix(core): fix server failing to accept connections under high-load" directly relates to the core changes in the changeset. The primary modifications are in the IODispatcher classes (AbstractIODispatcher, IODispatcherLinux, IODispatcherOsx, and IODispatcherWindows), which handle connection acceptance on different platforms. The changes introduce a greedy accept loop with state tracking (pendingAccept flag) and return-value propagation to address the root cause of connections being stranded in the queue under high load. The title is concise, specific, and clearly conveys the main purpose of the fix as described in the PR objectives.
Description Check ✅ Passed The pull request description is clearly related to the changeset. It describes the problem being fixed: server failure to accept connections on busy Linux systems when many clients connect simultaneously, manifesting as client connection timeouts. This directly corresponds to the core changes in the IODispatcher classes, particularly IODispatcherLinux, which now include state tracking and retry logic to handle edge-triggered epoll behavior. The description provides sufficient context about the problem without being vague or generic.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jerrinot
Copy link
Copy Markdown
Contributor Author

@coderabbitai please review this little piggie 🐷

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 30, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get PR details and files changed
gh pr view 6334 --json title,body,files,additions,deletions --repo questdb/questdb

# Get the diff for the PR
gh pr diff 6334 --repo questdb/questdb

Length of output: 9188


@jerrinot 🐷 This is a well-crafted fix for a subtle edge-triggered epoll issue! Here's my review:

The Good 🎯

Core Fix is Correct
The root cause analysis is spot-on: with edge-triggered epoll, if accept() exits early (timeout/limit), remaining connections in the queue won't trigger another notification, causing them to be stranded. Your solution elegantly tracks this state and retries.

Smart Optimization
The placement of the pendingAccept check after the main event loop is clever:

// If a new connection event arrived in this round, the main loop's accept()
// would have drained all connections, resetting pendingAccept to false
if (pendingAccept && isListening()) {
    pendingAccept = !accept(timestamp);
    useful = true;
}

This avoids unnecessary syscalls when the queue was already drained by the main loop.

Good Documentation
The javadoc on accept() clearly explains the return value semantics, and the inline comments in IODispatcherLinux thoroughly explain the ET epoll behavior.

Test Coverage
Randomizing acceptLoopTimeout in tests (0-500ms range) will help catch timing-dependent issues.

Minor Observations 🔍

  1. Logic Inversion - The error handling flip from if (nf.errno() != Net.EWOULDBLOCK) to if (nf.errno() == Net.EWOULDBLOCK) is correct and more readable now.

  2. Thread Safety - pendingAccept is not volatile/atomic, but this is fine since runSerially() operates in a single-threaded event loop.

  3. Edge Case - The isListening() check in the retry logic is good defensive programming - prevents accepting when the listener is paused.

Nitpick 🤏

The drainedFully variable is initialized to false and only set to true in one place. While functionally correct, you could simplify slightly by returning directly in the EWOULDBLOCK case and defaulting the return to false at the end. But the current approach is more explicit about intent, so it's fine as-is.

Verdict ✅

This fix properly addresses the high-load connection acceptance issue. The implementation is correct, well-documented, and includes good test coverage. Ship it! 🚢

@jerrinot
Copy link
Copy Markdown
Contributor Author

Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single (package-web-console) on project questdb: Failed to create assembly: Error creating assembly archive web-console: Problem creating zip: Execution exception: Java heap space -> [Help 1] 😬

ideoma
ideoma previously approved these changes Oct 30, 2025
ideoma
ideoma previously approved these changes Oct 30, 2025
ideoma
ideoma previously approved these changes Oct 30, 2025
…eive just a single notification, no matter how many connection are waiting
@bluestreak01
Copy link
Copy Markdown
Member

@CodeRabbit review. This PR fixes message loss around timeouts. Could you check if message loss in IODispatchers is possible under any other circumstances

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 31, 2025

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the full implementation of IODispatcher classes to analyze for potential message loss
echo "=== IODispatcherLinux.java ==="
cat core/src/main/java/io/questdb/network/IODispatcherLinux.java

echo -e "\n\n=== IODispatcherOsx.java ==="
cat core/src/main/java/io/questdb/network/IODispatcherOsx.java

echo -e "\n\n=== IODispatcherWindows.java ==="
cat core/src/main/java/io/questdb/network/IODispatcherWindows.java

echo -e "\n\n=== AbstractIODispatcher.java ==="
cat core/src/main/java/io/questdb/network/AbstractIODispatcher.java

Length of output: 77951


🏁 Script executed:

#!/bin/bash
# Look for I/O operation handling - read and write methods
echo "=== Finding read/write operations in IOContext implementations ==="
fd -e java -x grep -l "IOContext" {} \; | head -20

echo -e "\n=== Checking for buffer draining patterns ==="
rg -n -C3 "EWOULDBLOCK|EAGAIN" --type java core/src/main/java/io/questdb/network/

Length of output: 4657


🏁 Script executed:

#!/bin/bash
# Look for IOOperation and related context classes to understand message handling
echo "=== IOOperation interface ==="
find core -name "IOOperation.java" -exec cat {} \;

echo -e "\n\n=== IOContext interface ==="
find core -name "IOContext.java" -exec cat {} \;

echo -e "\n=== Looking for context implementations ==="
fd "Context.java$" core/src/main/java/io/questdb/ --exec basename {} \; | head -20

Length of output: 6277


🏁 Script executed:

#!/bin/bash
# Examine the read and write handling in dispatchers
echo "=== Searching for read buffer handling ==="
rg -n -A10 "readyForRead|processRead" --type java core/src/main/java/io/questdb/network/IODispatcher*.java

echo -e "\n\n=== Searching for write buffer handling ==="
rg -n -A10 "readyForWrite|processWrite" --type java core/src/main/java/io/questdb/network/IODispatcher*.java

Length of output: 17992


🏁 Script executed:

#!/bin/bash
# Check for connection handling and potential race conditions
echo "=== Disconnect and error handling ==="
rg -n -B3 -A5 "disconnect|close|error" --type java core/src/main/java/io/questdb/network/AbstractIODispatcher.java | head -100

echo -e "\n\n=== Event registration patterns ==="
rg -n "register.*FD|epoll|kqueue" --type java core/src/main/java/io/questdb/network/IODispatcher*.java | head -50

Length of output: 11178


@bluestreak01, great question! I've analyzed the IODispatcher implementations across all platforms. Your fix addresses the connection acceptance issue excellently. Here's what I found regarding other potential message loss scenarios:

✅ Areas That Look Good

  1. Normal Read/Write Operations: The registration flow properly handles partial reads/writes through the interestQueue → re-arm cycle. Workers re-register when they can't complete operations.

  2. TLS Plaintext Buffering: The optimization that immediately publishes READ operations when isMorePlaintextBuffered() returns true is safe because it happens in the single-threaded dispatcher context.

  3. Heartbeat Handling: Properly de-registers from epoll/kqueue during heartbeat and re-registers on completion.

⚠️ Potential Issue: Data Arrival During Suspended Operations

I found a scenario where message loss is possible with edge-triggered epoll (Linux) and kqueue (OSX):

The Problem:
When an operation is suspended (waiting on a SuspendEvent), the dispatcher registers the socket FD for READ to detect client disconnects:

if (suspendEvent != null) {
    // if the operation was suspended, we request a read to be able to detect a client disconnect
    operation = IOOperation.READ;
    // ... register suspend event ...
}

If data arrives while suspended:

  1. Edge-triggered epoll fires a READ event
  2. handleSocketOperation() sees the suspend event and calls testConnection()
  3. Connection is alive, so it re-arms epoll with EPOLL_CTL_MOD
  4. Critical: The data that triggered the event is never consumed
  5. With edge-triggered semantics, no new event fires until state changes (more data arrives)
  6. When suspension ends, resumeOperation() calls rearmEpoll() with EPOLL_CTL_MOD
  7. Problem: EPOLL_CTL_MOD doesn't reset the edge trigger - data remains stranded in kernel buffer

Impact:

  • Affected Platforms: Linux (epoll), OSX (kqueue) - both edge-triggered
  • Not Affected: Windows (select is level-triggered, will keep firing)
  • Probability: Medium - depends on suspend event usage and timing
  • Symptom: Data sits unprocessed until more data arrives or timeout occurs

Suggested Fix:
When resuming from suspension, either:

  1. Immediately publish the operation without waiting for epoll (similar to plaintext buffer bypass)
  2. Use EPOLL_CTL_DEL + EPOLL_CTL_ADD instead of EPOLL_CTL_MOD to reset the edge
  3. Perform a non-blocking read/write attempt to drain any buffered data

Example fix in IODispatcherLinux.java:

private void resumeOperation(C context, long id, int operation) {
    // Check if there's already data/space available (may have arrived during suspension)
    if (operation == IOOperation.READ && context.getSocket().isMorePlaintextBuffered()) {
        // Data available in TLS buffer, publish immediately
        context.clearSuspendEvent();
        publishOperation(operation, context);
        pending.deleteRow(pending.binarySearch(id, OPM_ID));
        return;
    }
    
    // Re-register with DEL+ADD to reset edge trigger
    long fd = context.getFd();
    epoll.control(fd, id, EpollAccessor.EPOLL_CTL_DEL, 0);
    epoll.control(fd, id, EpollAccessor.EPOLL_CTL_ADD, epollOp(operation, context));
    context.clearSuspendEvent();
}

📋 Other Observations

  1. OSX/Windows Accept: Neither has the retry logic you added for Linux, but both use level-triggered mechanisms (kqueue without EV_CLEAR for listener, select) so pending connections will keep triggering events. This should be fine.

  2. Connection Limit Race: The listener pause/resume logic looks robust with proper checks for concurrent limit changes.

Would you like me to create an issue to track the suspend event data loss scenario?

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d979ed8 and ffd26d3.

📒 Files selected for processing (10)
  • ci/templates/hosted-jobs.yml (1 hunks)
  • ci/test-pipeline.yml (1 hunks)
  • core/src/main/java/io/questdb/network/AbstractIODispatcher.java (4 hunks)
  • core/src/main/java/io/questdb/network/IODispatcherLinux.java (3 hunks)
  • core/src/main/java/io/questdb/network/IODispatcherOsx.java (1 hunks)
  • core/src/main/java/io/questdb/network/IODispatcherWindows.java (1 hunks)
  • core/src/test/java/io/questdb/test/TestListener.java (3 hunks)
  • core/src/test/java/io/questdb/test/cutlass/pgwire/BasePGTest.java (2 hunks)
  • core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
  • GitHub Check: New pull request (Hosted Running tests on windows-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (18)
core/src/test/java/io/questdb/test/cutlass/pgwire/BasePGTest.java (2)

94-94: LGTM! Good default timeout value.

The field declaration provides a sensible default of 500ms and allows test subclasses to customize the accept loop timeout for exercising different timing scenarios.


589-592: LGTM! Configuration properly propagated.

The override correctly propagates the test-configurable acceptLoopTimeout to the PG server configuration, enabling tests to exercise various timeout values.

core/src/test/java/io/questdb/test/cutlass/pgwire/PGArraysTest.java (2)

81-81: LGTM! Randomization supports timing bug detection.

Randomizing the accept loop timeout (10-509 ms) helps catch timing-dependent issues related to the high-load accept fix, as intended by the PR.


87-87: LGTM! Helpful diagnostic logging.

Including the accept loop timeout in the diagnostic log aids debugging if timing-related test failures occur.

core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java (1)

235-241: LGTM! Test infrastructure enhancement supports timing-dependent testing.

The randomization of acceptLoopTimeout (10-509 ms) helps validate the connection acceptance fix under varying timeout conditions, which aligns with the PR's objective to address high-load connection failures. The logging addition aids in diagnosing timing-related issues during test runs.

ci/templates/hosted-jobs.yml (1)

121-121: Consistency confirmed with test-pipeline.yml.

The addition of -XX:+UseParallelGC here mirrors the change in ci/test-pipeline.yml, which is good. This template covers a wide range of test configurations across macOS, Windows, and Linux, so the change has broad coverage.

Please confirm that the verification from the ci/test-pipeline.yml review passes (checking for consistent MAVEN_OPTS definitions across all CI configs).

core/src/main/java/io/questdb/network/AbstractIODispatcher.java (2)

342-437: LGTM! Well-documented greedy accept loop with drain tracking.

The boolean return value correctly distinguishes between a fully drained accept queue (EWOULDBLOCK) and early exit due to timeout or connection limit. This is essential for the edge-triggered epoll fix, as the caller can now detect when connections remain queued and retry on the next iteration to avoid stranding.

The comment explaining why acceptEndTime uses clock.getTicks() rather than the passed timestamp is helpful.


480-491: LGTM! Boolean return propagates disconnect processing status.

The method now correctly returns whether useful work occurred (processing disconnects or closing the listener due to timeout). This enables platform-specific dispatchers to make informed control-flow decisions.

core/src/main/java/io/questdb/network/IODispatcherOsx.java (1)

401-463: LGTM! Properly integrates processDisconnects return value.

The method now correctly captures the boolean result from processDisconnects(timestamp) and incorporates it into the final return via the OR expression. This ensures that disconnect processing contributes to the "useful work" determination.

core/src/main/java/io/questdb/network/IODispatcherWindows.java (1)

158-167: LGTM! Improved error handling with disconnect status.

The method now correctly captures the result from processDisconnects(timestamp) and returns it on select failure, rather than unconditionally returning false. This ensures that any useful disconnect processing is properly reflected in the return value, even when select fails.

core/src/main/java/io/questdb/network/IODispatcherLinux.java (5)

36-36: LGTM! State flag for tracking incomplete accept operations.

The pendingAccept flag correctly tracks when accept() exits early without fully draining the queue, enabling retry on the next iteration to prevent connection stranding with edge-triggered epoll.


387-387: LGTM! Captures disconnect processing status.

Correctly integrates the boolean return from processDisconnects(timestamp) into the useful flag, aligning with the refactored API.


391-392: Excellent! Key fix for edge-triggered epoll connection stranding.

The modified condition if (n > 0 || pendingAccept) ensures that accept processing occurs even when epoll returns no new events, preventing connections from being stranded when the previous accept() call exited early due to timeout or connection limit. The acceptProcessed flag prevents redundant accept attempts within the same iteration.


402-404: LGTM! Correct state tracking for accept operation.

The logic correctly sets pendingAccept = !accept(timestamp) so that:

  • When accept() returns true (fully drained), pendingAccept becomes false
  • When accept() returns false (early exit), pendingAccept remains true for retry

The acceptProcessed flag prevents redundant accept calls within the same iteration.


413-418: Excellent! Retry mechanism prevents connection stranding across iterations.

This post-check correctly retries accept() when:

  1. pendingAccept is true (connections remain from previous iteration)
  2. acceptProcessed is false (accept wasn't already called in the epoll event loop)
  3. isListening() is true (server socket is active)

The comment about branch prediction is valuable—since pendingAccept is normally false in steady state, testing it first makes the branch highly predictable for the CPU.

core/src/test/java/io/questdb/test/TestListener.java (3)

52-54: LGTM! Good defensive null check.

The null check correctly handles the case where ThreadMXBean.getThreadInfo() returns null for threads that terminated between retrieving thread IDs and fetching thread info.


122-125: LGTM! Clearer error message.

The [kind=java] label helpfully distinguishes the Java thread dumper from the native thread dumper, improving diagnostics when failures occur.


134-142: LGTM! Error handling improves test infrastructure resilience.

The try-catch block ensures that failures in the native thread dumper are logged rather than silently terminating the monitoring thread. Catching Throwable is appropriate here for maximum resilience in test monitoring infrastructure.

@jerrinot
Copy link
Copy Markdown
Contributor Author

@bluestreak01 I pushed 2 additional commits as discussed:

  1. io dispatcher inlines bully with additional 'closed' guard
  2. enforce minimal queue sizes

getLimit() is also used to calculate the backlog for server socket. I did not change that. so a sudden spike of connections might still require a restart. I could not come up with a sensible minimal backlog size.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 31 / 40 (77.50%)

file detail

path covered line new line coverage
🔵 io/questdb/network/IODispatcherWindows.java 0 2 00.00%
🔵 io/questdb/network/IODispatcherOsx.java 0 1 00.00%
🔵 io/questdb/network/AbstractIODispatcher.java 18 24 75.00%
🔵 io/questdb/network/IODispatcherLinux.java 10 10 100.00%
🔵 io/questdb/network/IODispatcherConfiguration.java 3 3 100.00%

@bluestreak01 bluestreak01 merged commit ced2144 into master Oct 31, 2025
34 checks passed
@bluestreak01 bluestreak01 deleted the jh_stranding_accept branch October 31, 2025 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants