fix(core): potential connection leakage on Windows by puzpuzpuz · Pull Request #6243 · questdb/questdb

puzpuzpuz · 2025-10-08T09:34:51Z

The connections were leaked here due to broken operation id order:

questdb/core/src/main/java/io/questdb/network/IODispatcherWindows.java

Lines 81 to 84 in bdd0fbf

    
           int heartbeatRow = pendingHeartbeats.binarySearch(srcOpId, OPM_ID); 
        
           if (heartbeatRow < 0) { 
        
               continue; // The connection is already closed. 
        
           } else {

The operation id order was broken due to missing id assignment here (Linux and OSX dispatchers have that assignment):

questdb/core/src/main/java/io/questdb/network/IODispatcherWindows.java

Lines 149 to 151 in bdd0fbf

    
           protected void pendingAdded(int index) { 
        
               pending.set(index, OPM_OPERATION, initialBias == IODispatcherConfiguration.BIAS_READ ? IOOperation.READ : IOOperation.WRITE); 
        
           }

The bug may have manifested in the following scenario: you open a TCP connection, but write to it only after some time, so that heartbeat mechanism has the time to kick in.

Also, includes the following:

Moves operation and event id generation code to AbstractIODispatcher, so that it's common across all I/O dispatcher implementations.
OSX dispatcher used an int for operation id which is easy to overflow. On overflow, we'd face the same problem with broken operation id order invariant in pending and pendingHeartbeat lists. This is fixed by switching to long counter as in the other two dispatchers.
Marks listener field as volatile in all pool implementations to avoid data races in tests.

coderabbitai · 2025-10-08T09:35:48Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Network I/O dispatchers refactor identifier handling (introduce centralized ID sequencing, switch int→long IDs on macOS, remove per-OS generators), adjust Kqueue API to use long data and expose OS FDs, minor logging/format tweaks, add a volatile listener field, add null checks in tests, and tweak test timeouts/socket options.

Changes

Cohort / File(s)	Summary of Changes
Native kqueue formatting `core/src/main/c/osx/kqueue.c`	Whitespace adjustments around casts in return statements; no logic change.
Pool listener volatility `core/src/main/java/io/questdb/cairo/pool/AbstractPool.java`	Make eventListener volatile with comment; no behavioral change elsewhere.
Dispatcher ID sequencing (core base) `core/src/main/java/io/questdb/network/AbstractIODispatcher.java`	Add private idSeq and protected nextEventId()/nextOpId(); change pendingAdded from abstract to no-op concrete; integrate ID generation where pending is queued.
Linux dispatcher cleanup `core/src/main/java/io/questdb/network/IODispatcherLinux.java`	Remove local ID sequencing (idSeq, nextEventId/nextOpId) and pendingAdded override; adjust logs/errno formatting; rely on existing IDs.
macOS dispatcher ID width and FD handling `core/src/main/java/io/questdb/network/IODispatcherOsx.java`	Switch IDs from int to long across pending, suspend, heartbeat, and logging; replace LongHashSet→IntHashSet for handled OS FDs; adapt to Kqueue long data; compare FDs via Files.toOsFd; remove nextEventId/nextOpId and pendingAdded override; adjust KeventWriter API usage.
Windows dispatcher adjustments `core/src/main/java/io/questdb/network/IODispatcherWindows.java`	Remove local op ID generation (idSeq/nextOpId); enhance heartbeat not-found logging; minor log formatting changes.
Kqueue API data width and FD exposure `core/src/main/java/io/questdb/network/Kqueue.java`	Change data access from int→long (getData, readFD, writeFD, internal commonFd); write DATA_OFFSET as long; remove getFd(), add getOsFd(); update listen flags usage.
Line TCP fuzz test null-guards `core/src/test/java/io/questdb/test/cutlass/line/tcp/AbstractLineTcpReceiverFuzzTest.java`	Add null checks before table permit/return/mapping to avoid NPEs.
Line TCP test timeout and socket opts `core/src/test/java/io/questdb/test/cutlass/line/tcp/AbstractLineTcpReceiverTest.java`	Override getTimeout() to 10 minutes; set TCP no-delay and keep-alive in newSocket(); add Dates import.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as App code
  participant Dispatcher as AbstractIODispatcher
  participant OS as OS I/O (kqueue/epoll/IOCP)
  participant Conn as Connection

  rect rgb(240,248,255)
    note over Dispatcher: Centralized ID sequencing
    App->>Dispatcher: register(fd)
    Dispatcher->>Dispatcher: opId = nextOpId()
    Dispatcher->>OS: arm read/write with fd, opId (long on macOS)
  end

  OS-->>Dispatcher: event(fd, data=eventId/opId)
  Dispatcher->>Dispatcher: eventId = nextEventId()
  Dispatcher->>Conn: handleSocketOperation(fd, eventId/opId)
  alt heartbeat
    Dispatcher->>Dispatcher: correlate heartbeat by opId
  else normal I/O
    Dispatcher->>OS: re-arm as needed
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

mtopolnik

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.84% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title directly indicates that the pull request fixes a connection leakage issue on Windows, which is the primary problem addressed by the changeset, and correctly uses the conventional `fix(core)` prefix to signal a core bug fix. It is concise, specific, and understandable to teammates scanning the history.
Description Check	✅ Passed	The description clearly outlines the root cause of the connection leakage bug in the Windows dispatcher, references the relevant code locations, and summarizes the additional refactoring changes for id generation, OSX counter adjustments, and volatile listener fields. It directly relates to the provided diff and gives sufficient context for reviewers.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

core/src/main/java/io/questdb/cairo/pool/AbstractPool.java

bluestreak01 · 2025-10-09T12:18:59Z

@CodeRabbit review

coderabbitai · 2025-10-09T12:19:42Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

core/src/main/java/io/questdb/network/IODispatcherOsx.java (1)

231-238: Suspend event is re-registered during heartbeat instead of de-registered

Comment says "de-register", but code calls readFD(), which re-arms the event and then deletes its tracking row. This will fire an untracked kqueue event later and log "unexpected event id". Replace with removeReadFD().

-                } else {
-                    final long eventId = pendingEvents.get(eventRow, EVM_ID);
-                    keventWriter.prepare().readFD(suspendEvent.getFd(), eventId).done();
-                    pendingEvents.deleteRow(eventRow);
-                }
+                } else {
+                    // De-register the suspend event while heartbeat is in-flight.
+                    keventWriter.prepare().removeReadFD(suspendEvent.getFd()).done();
+                    pendingEvents.deleteRow(eventRow);
+                }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bdd0fbf and 8ca493b.

📒 Files selected for processing (9)

core/src/main/c/osx/kqueue.c (1 hunks)
core/src/main/java/io/questdb/cairo/pool/AbstractPool.java (1 hunks)
core/src/main/java/io/questdb/network/AbstractIODispatcher.java (3 hunks)
core/src/main/java/io/questdb/network/IODispatcherLinux.java (11 hunks)
core/src/main/java/io/questdb/network/IODispatcherOsx.java (19 hunks)
core/src/main/java/io/questdb/network/IODispatcherWindows.java (2 hunks)
core/src/main/java/io/questdb/network/Kqueue.java (3 hunks)
core/src/test/java/io/questdb/test/cutlass/line/tcp/AbstractLineTcpReceiverFuzzTest.java (1 hunks)
core/src/test/java/io/questdb/test/cutlass/line/tcp/AbstractLineTcpReceiverTest.java (4 hunks)

🧰 Additional context used

🪛 ast-grep (0.39.6)

core/src/test/java/io/questdb/test/cutlass/line/tcp/AbstractLineTcpReceiverTest.java

[info] 303-303: "Detected use of a Java socket that is not encrypted. As a result, the
traffic could be read by an attacker intercepting the network traffic. Use
an SSLSocket created by 'SSLSocketFactory' or 'SSLServerSocketFactory'
instead."
Context: new Socket(sockaddr, fd)
Note: [CWE-319] Cleartext Transmission of Sensitive Information [REFERENCES]
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(unencrypted-socket-java)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)

GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
GitHub Check: New pull request (Hosted Running tests on windows-other)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
GitHub Check: New pull request (Check Changes Check changes)
GitHub Check: build
GitHub Check: build

🔇 Additional comments (2)

core/src/main/java/io/questdb/network/IODispatcherOsx.java (1)

66-68: Parity sanity: event vs operation IDs

isEventId() assumes event IDs are odd and op IDs are even. Please confirm nextOpId()/nextEventId() in AbstractIODispatcher guarantee this invariant across all dispatchers.

Also applies to: 266-267, 322-329

core/src/main/java/io/questdb/network/Kqueue.java (1)

84-90: Confirm EV flags for listen()

listen() now uses EV_ADD only (no EV_ONESHOT). That’s likely correct for a persistent server socket. Please confirm this change was intentional.

core/src/main/java/io/questdb/network/Kqueue.java

…o puzpuzpuz_flaky_ilp_tests

…lp_tests

glasstiger · 2025-10-09T16:15:19Z

[PR Coverage check]

😞 fail : 15 / 71 (21.13%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/network/IODispatcherWindows.java	0	8	00.00%
🔵	io/questdb/network/IODispatcherOsx.java	0	28	00.00%
🔵	io/questdb/network/Kqueue.java	0	4	00.00%
🔵	io/questdb/network/IODispatcherLinux.java	9	25	36.00%
🔵	io/questdb/network/AbstractIODispatcher.java	6	6	100.00%

bluestreak01 · 2025-10-30T09:38:49Z

@CodeRabbit review. Although this is merged, it is possible a bug was introduced. Events from IODispatcher might be lost occasionally. Re-check the whole change

coderabbitai · 2025-10-30T09:43:41Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the complete content of all modified dispatcher files
echo "=== AbstractIODispatcher.java ==="
cat core/src/main/java/io/questdb/network/AbstractIODispatcher.java

echo -e "\n=== IODispatcherLinux.java ==="
cat core/src/main/java/io/questdb/network/IODispatcherLinux.java

echo -e "\n=== IODispatcherOsx.java ==="
cat core/src/main/java/io/questdb/network/IODispatcherOsx.java

echo -e "\n=== IODispatcherWindows.java ==="
cat core/src/main/java/io/questdb/network/IODispatcherWindows.java

echo -e "\n=== Kqueue.java ==="
cat core/src/main/java/io/questdb/network/Kqueue.java

Length of output: 82256

@bluestreak01, I'll perform a thorough analysis of the changes to identify potential event loss issues.

🏁 Script executed:

#!/bin/bash
# Search for all usages of nextOpId and nextEventId to trace ID generation patterns
echo "=== nextOpId() call sites ==="
rg -n "nextOpId\(\)" --type java

echo -e "\n=== nextEventId() call sites ==="
rg -n "nextEventId\(\)" --type java

echo -e "\n=== Binary searches on OPM_ID ==="
rg -n "binarySearch.*OPM_ID" --type java

echo -e "\n=== Early returns/continues in processRegistrations ==="
rg -n -A2 -B2 "processRegistrations.*continue" --type java

Length of output: 2377

@bluestreak01, I've identified a critical bug in the ID generation logic that can cause event loss. Here's my analysis:

🔴 Critical Bug: Broken ID Ordering Invariant

Root Cause

In all three dispatcher implementations (IODispatcherLinux, IODispatcherOsx, IODispatcherWindows), the processRegistrations() method calls nextOpId() before validating whether the operation should proceed:

IODispatcherLinux.java (lines 241-256):

private boolean processRegistrations(long timestamp) {
    // ...
    while ((cursor = interestSubSeq.next()) > -1) {
        final IOEvent<C> event = interestQueue.get(cursor);
        final C context = event.context;
        final int requestedOperation = event.operation;
        final long srcOpId = context.getAndResetHeartbeatId();
        interestSubSeq.done(cursor);

        useful = true;
        final long opId = nextOpId();  // ❌ ID GENERATED HERE
        final long fd = context.getFd();

        int operation = requestedOperation;
        final SuspendEvent suspendEvent = context.getSuspendEvent();
        int epollCmd = EpollAccessor.EPOLL_CTL_MOD;
        if (requestedOperation == IOOperation.HEARTBEAT) {
            assert srcOpId != -1;

            int heartbeatRow = pendingHeartbeats.binarySearch(srcOpId, OPM_ID);
            if (heartbeatRow < 0) {
                LOG.info().$("could not find heartbeat, connection must be already closed [fd=").$(fd)
                        .$(", srcId=").$(srcOpId)
                        .I$();
                continue;  // ❌ EARLY EXIT - ID WASTED, ORDERING BROKEN
            }

The same pattern exists in IODispatcherOsx.java (line 261) and IODispatcherWindows.java (line 68).

Why This Causes Event Loss

ID Generation: When nextOpId() is called, it increments idSeq and returns an even number (e.g., 200)
Early Continue: If the heartbeat reference is not found, the method continues without adding anything to pending
Wasted ID: The generated ID (200) is never used
Next Iteration: The next valid operation gets ID 202
Ordering Break: If operations are later looked up by their original ID, or if the binary search expects strict monotonic ordering in the insertion sequence, events can be lost or lookups can fail

Monotonic Ordering Requirement

The code relies heavily on binary search over OPM_ID:

pending.binarySearch(id, OPM_ID) - appears 9 times across all dispatchers
pendingHeartbeats.binarySearch(srcOpId, OPM_ID) - appears 4 times
pendingEvents.binarySearch(id, ...) - appears 4 times

Binary search requires that "the range must be sorted" and "if it is not sorted, the results are undefined."

While the IDs in pending might still be monotonically increasing (since new rows are always appended), the gaps in ID sequence mean that:

IDs from epoll/kqueue/select events might reference wasted IDs
Binary searches expecting contiguous or predictable ID patterns will fail
Event correlation between different matrices becomes unreliable

Affected Code Paths

🏁 Script executed:

#!/bin/bash
# Find all places where nextOpId is called early and could be wasted
echo "=== processRegistrations with early continues ==="
rg -n -B5 -A15 "final long opId = nextOpId\(\)" core/src/main/java/io/questdb/network/

echo -e "\n=== All continue statements after nextOpId ==="
rg -n -B10 "nextOpId.*continue" core/src/main/java/io/questdb/network/ || echo "Pattern not found, checking manually..."

# Check for the specific pattern
echo -e "\n=== Heartbeat lookup failures leading to continue ==="
rg -n -A3 "could not find heartbeat" core/src/main/java/io/questdb/network/

Length of output: 8487

🏁 Script executed:

#!/bin/bash
# Check for other early continue paths after nextOpId is called
echo "=== All continue statements in processRegistrations ==="
rg -n -B15 -A2 "^\s*continue;" core/src/main/java/io/questdb/network/IODispatcher*.java | grep -A15 -B15 "processRegistrations"

echo -e "\n=== Plaintext buffered early return path ==="
rg -n -B5 -A3 "isMorePlaintextBuffered.*publishOperation.*continue" core/src/main/java/io/questdb/network/

Length of output: 101

🏁 Script executed:

#!/bin/bash
# Extract the full processRegistrations method from each dispatcher
echo "=== IODispatcherLinux processRegistrations ==="
sed -n '/private boolean processRegistrations/,/^    }/p' core/src/main/java/io/questdb/network/IODispatcherLinux.java | grep -n "continue"

echo -e "\n=== IODispatcherOsx processRegistrations ==="
sed -n '/private boolean processRegistrations/,/^    }/p' core/src/main/java/io/questdb/network/IODispatcherOsx.java | grep -n "continue"

echo -e "\n=== IODispatcherWindows processRegistrations ==="
sed -n '/private boolean processRegistrations/,/^    }/p' core/src/main/java/io/questdb/network/IODispatcherWindows.java | grep -n "continue"

echo -e "\n=== Now check isMorePlaintextBuffered pattern ==="
rg -n "isMorePlaintextBuffered" core/src/main/java/io/questdb/network/IODispatcher*.java

Length of output: 958

🏁 Script executed:

#!/bin/bash
# Get context around isMorePlaintextBuffered to see if it's after nextOpId
echo "=== IODispatcherLinux isMorePlaintextBuffered context ==="
rg -n -B10 -A5 "isMorePlaintextBuffered" core/src/main/java/io/questdb/network/IODispatcherLinux.java

echo -e "\n=== IODispatcherOsx isMorePlaintextBuffered context ==="
rg -n -B10 -A5 "isMorePlaintextBuffered" core/src/main/java/io/questdb/network/IODispatcherOsx.java

echo -e "\n=== IODispatcherWindows isMorePlaintextBuffered context ==="
rg -n -B10 -A5 "isMorePlaintextBuffered" core/src/main/java/io/questdb/network/IODispatcherWindows.java

Length of output: 2767

🔴 Confirmed: TWO Bug Locations Per Dispatcher

All three dispatchers (IODispatcherLinux, IODispatcherOsx, IODispatcherWindows) waste generated operation IDs in two locations within processRegistrations():

Bug #1: Heartbeat Not Found

Lines: 258 (Linux), 273 (OSX), 80 (Windows)
Scenario: When a heartbeat response arrives but the original heartbeat entry is missing from pendingHeartbeats
Impact: The opId generated at the start of the loop is wasted

Bug #2: Plaintext Already Buffered

Lines: 282 (Linux), 296 (OSX), 103 (Windows)
Scenario: When a READ operation is registered but TLS socket already has decrypted plaintext buffered
Impact: Operation is immediately published without adding to pending, wasting the opId

🔍 Event Loss Mechanism

Here's how events can be lost:

Scenario:

1. Connection A registers READ → opId = 100 (even number)
   - Plaintext buffered → continue (ID 100 wasted)
2. Connection B registers WRITE → opId = 102
   - Added to pending[0] with ID 102
3. Connection C registers READ → opId = 104
   - Added to pending[1] with ID 104
4. epoll/kqueue returns event with ID 102
5. Binary search: pending.binarySearch(102, OPM_ID)
   - pending has: [102, 104]
   - Search finds row 0 ✓

Later:
6. Connection D registers HEARTBEAT → opId = 106
   - Heartbeat not found → continue (ID 106 wasted)
7. Connection E registers READ → opId = 108
   - Added to pending[2] with ID 108
8. Connection F registers WRITE → opId = 110
   - Added to pending[3] with ID 110
9. epoll/kqueue returns event with ID 106 (delayed/resent)
10. Binary search: pending.binarySearch(106, OPM_ID)
    - pending has: [102, 104, 108, 110]
    - 106 not found → returns negative insertion point
11. LOG.critical "internal error: epoll returned unexpected id"
12. Event LOST ❌

While unlikely that a wasted ID would be returned by the kernel, the real problem is that the code assumes monotonic ordering for correctness. The gaps indicate a design flaw.

🔧 Recommended Fix

Move nextOpId() call to after all validation checks:

private boolean processRegistrations(long timestamp) {
    long cursor;
    boolean useful = false;
    while ((cursor = interestSubSeq.next()) > -1) {
        final IOEvent<C> event = interestQueue.get(cursor);
        final C context = event.context;
        final int requestedOperation = event.operation;
        final long srcOpId = context.getAndResetHeartbeatId();
        interestSubSeq.done(cursor);

        useful = true;
        final long fd = context.getFd();
        
        // ❌ REMOVED: final long opId = nextOpId();  

        int operation = requestedOperation;
        final SuspendEvent suspendEvent = context.getSuspendEvent();
        int epollCmd = EpollAccessor.EPOLL_CTL_MOD;
        if (requestedOperation == IOOperation.HEARTBEAT) {
            assert srcOpId != -1;

            int heartbeatRow = pendingHeartbeats.binarySearch(srcOpId, OPM_ID);
            if (heartbeatRow < 0) {
                LOG.info().$("could not find heartbeat, connection must be already closed [fd=").$(fd)
                        .$(", srcId=").$(srcOpId)
                        .I$();
                continue;  // ✓ Early exit OK - no ID generated yet
            } else {
                final long opId = nextOpId();  // ✓ Generate ID only when needed
                epollCmd = EpollAccessor.EPOLL_CTL_ADD;
                operation = (int) pendingHeartbeats.get(heartbeatRow, OPM_OPERATION);
                
                // ... rest of heartbeat handling with opId ...
            }
        } else {
            if (requestedOperation == IOOperation.READ && suspendEvent == null && context.getSocket().isMorePlaintextBuffered()) {
                publishOperation(IOOperation.READ, context);
                continue;  // ✓ Early exit OK - no ID generated yet
            }

            final long opId = nextOpId();  // ✓ Generate ID only when needed
            
            // ... rest of normal operation handling with opId ...
        }
    }
    return useful;
}

📊 Additional Concerns

Non-atomic increment: idSeq++ in nextOpId() and nextEventId() is not thread-safe, but this appears acceptable since each dispatcher runs on a single thread.
Suspend event correlation: The pendingEvents matrix uses EVM_OPERATION_ID to store references to operation IDs. If operation IDs have gaps, suspend event lookups could be affected.
Test coverage: The PR has 0% coverage for Windows (8 new lines) and OSX (28 new lines), making these bugs difficult to detect.

✅ Summary

The bug exists in all three platform-specific dispatchers and can cause:

Event loss when wasted IDs are later returned by the kernel
Binary search failures due to broken monotonic ordering assumptions
Connection hangs when operations are not properly tracked

Recommendation: Apply the fix to move nextOpId() generation after all early-exit conditions in processRegistrations() for all three dispatchers.

Would you like me to create an issue to track this fix?

test(ilp): harden ILP/TCP tests

f36a9e8

puzpuzpuz self-assigned this Oct 8, 2025

puzpuzpuz added DO NOT MERGE These changes should not be merged to main branch Schrödinger's bug labels Oct 8, 2025

puzpuzpuz added 3 commits October 9, 2025 12:06

Fix op id order invariant in windows I/O dispatcher

dea88c3

Increate idle timeout

12a4ca2

Reuse operation id code across I/O dispatchers

6b2d3d4

puzpuzpuz added Bug Incorrect or unexpected behavior Core Related to storage, data type, etc. and removed DO NOT MERGE These changes should not be merged to main branch Schrödinger's bug labels Oct 9, 2025

puzpuzpuz changed the title ~~test(ilp): harden ILP/TCP tests~~ fix(core): potential connection leakage on Windows Oct 9, 2025

bluestreak01 reviewed Oct 9, 2025

View reviewed changes

core/src/main/java/io/questdb/cairo/pool/AbstractPool.java Show resolved Hide resolved

Add clarifying comment

8ca493b

Merge branch 'master' into puzpuzpuz_flaky_ilp_tests

7a432f6

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

core/src/main/java/io/questdb/network/Kqueue.java Show resolved Hide resolved

puzpuzpuz and others added 5 commits October 9, 2025 15:34

Fix data load call

d6f217a

Merge remote-tracking branch 'upstream/puzpuzpuz_flaky_ilp_tests' int…

8ce2047

…o puzpuzpuz_flaky_ilp_tests

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_flaky_i…

f3cb500

…lp_tests

Address coderabbit comments

69455e8

tidy up

fffd570

bluestreak01 previously approved these changes Oct 9, 2025

View reviewed changes

bluestreak01 dismissed their stale review via fffd570 October 9, 2025 15:41

bluestreak01 approved these changes Oct 9, 2025

View reviewed changes

bluestreak01 merged commit eabbadc into master Oct 9, 2025
34 of 35 checks passed

bluestreak01 deleted the puzpuzpuz_flaky_ilp_tests branch October 9, 2025 17:28

coderabbitai bot mentioned this pull request Jan 20, 2026

chore(core): remove exception-based query pause mechanism #6670

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): potential connection leakage on Windows#6243

fix(core): potential connection leakage on Windows#6243
bluestreak01 merged 11 commits intomasterfrom
puzpuzpuz_flaky_ilp_tests

puzpuzpuz commented Oct 8, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 8, 2025 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

bluestreak01 commented Oct 9, 2025

Uh oh!

coderabbitai bot commented Oct 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

glasstiger commented Oct 9, 2025

Uh oh!

Uh oh!

bluestreak01 commented Oct 30, 2025

Uh oh!

coderabbitai bot commented Oct 30, 2025

🔴 Critical Bug: Broken ID Ordering Invariant

Root Cause

Why This Causes Event Loss

Monotonic Ordering Requirement

Affected Code Paths

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	int heartbeatRow = pendingHeartbeats.binarySearch(srcOpId, OPM_ID);
	if (heartbeatRow < 0) {
	continue; // The connection is already closed.
	} else {

	protected void pendingAdded(int index) {
	pending.set(index, OPM_OPERATION, initialBias == IODispatcherConfiguration.BIAS_READ ? IOOperation.READ : IOOperation.WRITE);
	}

Conversation

puzpuzpuz commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

bluestreak01 commented Oct 9, 2025

Uh oh!

coderabbitai bot commented Oct 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glasstiger commented Oct 9, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

bluestreak01 commented Oct 30, 2025

Uh oh!

coderabbitai bot commented Oct 30, 2025

🔴 Critical Bug: Broken ID Ordering Invariant

Root Cause

Why This Causes Event Loss

Monotonic Ordering Requirement

Affected Code Paths

🔴 Confirmed: TWO Bug Locations Per Dispatcher

Bug #1: Heartbeat Not Found

Bug #2: Plaintext Already Buffered

🔍 Event Loss Mechanism

🔧 Recommended Fix

📊 Additional Concerns

✅ Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

puzpuzpuz commented Oct 8, 2025 •

edited

Loading

coderabbitai bot commented Oct 8, 2025 •

edited

Loading