Add generic FileRegion support in io_uring stream channel by LuciferYang · Pull Request #16571 · netty/netty

LuciferYang · 2026-03-27T14:56:27Z

Motivation

AbstractIoUringStreamChannel only handles DefaultFileRegion (via splice). Any other FileRegion implementation hits UnsupportedOperationException from the parent's filterOutboundMessage.

This is inconsistent with EPOLL and NIO, which accept any FileRegion via a transferTo fallback. Frameworks like Apache Spark wrap DefaultFileRegion inside composite FileRegion implementations (e.g., MessageWithHeader) that work on NIO and EPOLL but crash on io_uring.

See #16570

Modification

For generic FileRegion (non-DefaultFileRegion or when splice is unsupported), let it pass through filterOutboundMessage into the outbound buffer, then convert it to a direct ByteBuf in 64 KB chunks via FileRegion.transferTo in the write path (scheduleWriteSingle / writeComplete0).

DefaultFileRegion + splice path is unchanged.
Data is chunked to limit memory usage — each chunk is read via transferTo into a capped ByteBufWritableByteChannel, then submitted as an io_uring async send.
Partial sends and EAGAIN preserve the chunk buffer for retry; unrecoverable errors release it and fail the write.
Empty FileRegions (count = 0) submit a 0-byte send so the completion path can remove them.

Also enabled testCustomFileRegion in IoUringSocketFileRegionTest and IoUringBufferRingSocketFileRegionTest.

Result

Any FileRegion implementation is now writable on io_uring, consistent with EPOLL and NIO.

Transport	`DefaultFileRegion`	Generic `FileRegion`
NIO	`FileChannel.transferTo`	`region.transferTo`
EPOLL	native sendfile	`region.transferTo` fallback
io_uring (before)	splice	UnsupportedOperationException
io_uring (after)	splice (unchanged)	chunked `transferTo` → ByteBuf → async send

LuciferYang · 2026-03-27T14:57:09Z

cc @normanmaurer

normanmaurer · 2026-03-30T11:00:05Z

@LuciferYang did you sign our ICLA ? https://netty.io/s/icla

LuciferYang · 2026-03-30T11:09:41Z

@LuciferYang did you sign our ICLA ? https://netty.io/s/icla

done

normanmaurer · 2026-03-30T14:04:56Z

+     * Maximum bytes per chunk when converting a generic {@link FileRegion}
+     * to {@link ByteBuf} for the io_uring async send path.
+     */
+    private static final int FILE_REGION_MAX_CHUNK_SIZE = 64 * 1024;


nit: Maybe add a system property ?

normanmaurer · 2026-03-30T14:05:27Z

+                }
+                long address = IoUring.memoryAddress(buf) + buf.readerIndex();
+                int length = buf.readableBytes();
+                ops = IoUringIoOps.newSend(fd, (byte) 0, 0, address, length, nextOpsId());


can we factor this out into another method and call the methods from here ?

LuciferYang · 2026-03-30T14:27:17Z

     */
-    private static final int FILE_REGION_MAX_CHUNK_SIZE = 64 * 1024;
+    private static final int FILE_REGION_MAX_CHUNK_SIZE =
+            SystemPropertyUtil.getInt("io.netty.iouring.fileRegionChunkSize", 64 * 1024);


Is this property name OK?

dreamlike-ocean · 2026-03-31T10:10:48Z

I have a few thoughts related to this PR for your consideration.
Perhaps we can explore them in the future

There was a previous report about splice performance in Proposal: configurable sendfile and pipe size for io_uring transport #15747.
I’m curious whether the current fallback implementation shows measurable performance differences compared to splice.

If the fallback path performs better under certain workloads, would it make sense to allow selecting it explicitly via a command-line option, instead of relying purely on automatic fallback?

If the chunk size exceeds the user-configured IO_URING_WRITE_ZERO_COPY_THRESHOLD, it may fall into a range where zero-copy is beneficial. In that case, would it make sense to route it through the io_uring_send_zc path instead?

LuciferYang · 2026-03-31T10:53:39Z

Thanks for the thoughtful suggestions, @dreamlike-ocean!

Splice vs fallback performance: Good point linking to Proposal: configurable sendfile and pipe size for io_uring transport #15747. In the current implementation, DefaultFileRegion takes the splice path when IoUring.isSpliceSupported() is true; otherwise it falls through to the same chunked transferTo → ByteBuf → send fallback as generic FileRegion. So the fallback path already serves as an implicit alternative when splice is unavailable. Making it explicitly selectable (e.g., via a config option to bypass splice even when supported) could be a useful follow-up, but I think that fits better as part of Proposal: configurable sendfile and pipe size for io_uring transport #15747 rather than in this PR.
Zero-copy send for large chunks: Yes, this is a valid optimization opportunity. Currently the fallback path submits chunks via IoUringIoOps.newSend() unconditionally — it doesn't check shouldWriteZeroCopy() / route through send_zc, even when the chunk size exceeds IO_URING_WRITE_ZERO_COPY_THRESHOLD. This could be added as a follow-up enhancement.

This PR focuses on correctness — making any FileRegion work on io_uring instead of throwing UnsupportedOperationException, consistent with NIO and EPOLL. Both of your suggestions are good directions for future performance optimization.

normanmaurer · 2026-04-01T08:51:57Z

@chrisvest PTAL

LuciferYang · 2026-04-08T06:25:19Z

What's the progress on this pr ...

chrisvest

Had some comments.

chrisvest · 2026-04-17T00:31:46Z

+                    try {
+                        ByteBufWritableByteChannel ch = new ByteBufWritableByteChannel(buf);
+                        int written = 0;
+                        while (written < chunkSize) {


We can use buf.writableBytes() > 0 as a condition and remove written

chrisvest · 2026-04-17T00:44:15Z

+                    }
+                } else {
+                    // Don't release the chunk buffer before checking the error —
+                    // for retryable errors (EAGAIN) we need to re-send the same data.


Where are we re-sending the same data? The FileRegion.transferred() got incremented in the scheduleWriteFileRegion call.

chrisvest · 2026-04-17T00:45:21Z

+                    } catch (Throwable cause) {
+                        releaseFileRegionChunkBuf();
+                        handleWriteError(cause);
+                        return true;


This return true looks redundant as we could fall through to the one below, right?

chrisvest · 2026-04-17T00:47:34Z

    @Override
    protected boolean supportsCustomFileRegion() {
-        return false;
+        return true;


We should add some tests that exercises the chunking, and also a test where we send two FileRegions in a row where at least the first one exercises the chunking.

The tests enabled by this flag only send 8 KiB of data, I think.

LuciferYang · 2026-04-17T06:26:38Z

All five addressed (9db54a0):

while (buf.writableBytes() > 0), written removed.
Added a note on FileRegion.transferred() — with the io_uring chunked fallback the counter advances when bytes are queued for submission, not when they're confirmed on the wire.
Reworded the EAGAIN comment. ioResult returns 0, and scheduleWriteFileRegion resubmits the same chunk buffer once POLLOUT fires (transferred() is already advanced, so the buffer holds the only copy of those bytes).
Dropped the inner try/catch — the outer catch handles the throw path the same way and falls through to return true.
Added testCustomFileRegionChunking and testTwoCustomFileRegionsWithChunking, and flipped IoUringBufferRingSocketFileRegionTest#supportsCustomFileRegion to true so the buffer-ring variant runs them too. Both tests wrap a DefaultFileRegion in a CountingFileRegion that delegates all methods but counts transferTo calls, so I can assert transferToCalls >= ceil(payload/chunkSize) — a mutation disabling chunking would fail.

A few small adjacent fixes came up while wiring this up:

Capped FILE_REGION_MAX_CHUNK_SIZE at 16 MiB so a bad io.netty.iouring.fileRegionChunkSize override can't OOM the process.
Included count and transferred in the "transferTo produced 0 bytes" ChannelException message so it's clear which region failed.

LuciferYang · 2026-04-21T03:59:34Z

@chrisvest bc2c593 try to fix the failed test

The chunking assertions in testCustomFileRegionChunking and testTwoCustomFileRegionsWithChunking expected at least 8 transferTo calls for a 512 KB payload with a 64 KB chunk, but CI only saw 1.

The tests run through AbstractComboTestsuiteTest, which cycles through three combos from IoUringSocketTestPermutation: io_uring+io_uring, io_uring+NIO client, and NIO+io_uring. The chunking code in AbstractIoUringStreamChannel.scheduleWriteFileRegion only runs when the client is IoUringSocketChannel. With an NIO client, NioSocketChannel.doWriteFileRegion passes the socket channel straight into region.transferTo(javaChannel(), position), and the JDK routes that through sendfile(2) — one kernel call copies the whole 512 KB, so transferTo is invoked exactly once. That's the correct behavior for NIO; the assertion just doesn't apply there.

Fix: after connect, capture boolean ioUringClient = cc instanceof IoUringSocketChannel; and gate the chunking count check with assumeTrue(ioUringClient, ...). The byte-count and exception assertions still run for all three combos, so data correctness is covered everywhere; only the chunking-specific count is skipped on the NIO-client combo.

IoUringSocketChannel is final and clientSocket() only hands out IoUringSocketChannel or NioSocketChannel, so the instanceof check covers every case the permutation can produce.

netty-project-bot · 2026-04-21T13:41:46Z

Could not create auto-port PR.
Got conflicts when cherry-picking onto 5.0.

LuciferYang · 2026-04-21T13:43:38Z

Thank you @normanmaurer and @chrisvest

LuciferYang · 2026-04-21T13:44:08Z

Could not create auto-port PR. Got conflicts when cherry-picking onto 5.0.

What do I need to do for this?

normanmaurer · 2026-04-21T14:02:59Z

@LuciferYang nothing ... PR for 5.0 is #16677

…16677) `AbstractIoUringStreamChannel` only handles `DefaultFileRegion` (via splice). Any other `FileRegion` implementation hits `UnsupportedOperationException` from the parent's `filterOutboundMessage`. This is inconsistent with EPOLL and NIO, which accept any `FileRegion` via a `transferTo` fallback. Frameworks like Apache Spark wrap `DefaultFileRegion` inside composite `FileRegion` implementations (e.g., `MessageWithHeader`) that work on NIO and EPOLL but crash on io_uring. See #16570 For generic `FileRegion` (non-`DefaultFileRegion` or when splice is unsupported), let it pass through `filterOutboundMessage` into the outbound buffer, then convert it to a direct `ByteBuf` in **64 KB chunks** via `FileRegion.transferTo` in the write path (`scheduleWriteSingle` / `writeComplete0`). - `DefaultFileRegion` + splice path is unchanged. - Data is chunked to limit memory usage — each chunk is read via `transferTo` into a capped `ByteBufWritableByteChannel`, then submitted as an io_uring async send. - Partial sends and EAGAIN preserve the chunk buffer for retry; unrecoverable errors release it and fail the write. - Empty FileRegions (count = 0) submit a 0-byte send so the completion path can remove them. Also enabled `testCustomFileRegion` in `IoUringSocketFileRegionTest` and `IoUringBufferRingSocketFileRegionTest`. Any `FileRegion` implementation is now writable on io_uring, consistent with EPOLL and NIO. | Transport | `DefaultFileRegion` | Generic `FileRegion` | |---|---|---| | NIO | `FileChannel.transferTo` | `region.transferTo` | | EPOLL | native sendfile | `region.transferTo` fallback | | io_uring (before) | splice | **UnsupportedOperationException** | | io_uring (after) | splice (unchanged) | chunked `transferTo` → ByteBuf → async send | --------- Co-authored-by: Norman Maurer <[email protected]> Co-authored-by: YangJie <[email protected]>

…s count() Motivation: The generic FileRegion fallback added in netty#16571 drains the region by looping on `buf.writableBytes() > 0` only. epoll's writeFileRegion() guards the same call with `if (region.transferred() >= region.count()) { in.remove(); return 0; }` -- the FileRegion contract permits implementations to assume transferTo() is not invoked past transferred() == count(). Calling transferTo() past completion can trigger side effects in custom AbstractFileRegion implementations -- for example emitting an extra framing prefix or trailer onto the wire -- which then corrupts the protocol on the receiving side. Modification: Tighten the io_uring drain-loop condition with `region.transferred() < region.count()` in addition to `buf.writableBytes() > 0`, mirroring epoll's short-circuit. Add IoUringSocketFileRegionTest#testFileRegionDrainStopsAtCompletion plus an OvershootDetectingFileRegion fixture that records every transferTo() call made past transferred() == count(). The test fails on the prior implementation and passes with this change. Result: io_uring's generic FileRegion path no longer over-calls transferTo(). Custom AbstractFileRegion implementations that rely on the documented contract (e.g. Apache Spark's SaslEncryption, where SparkSaslSuite fails deterministically under io_uring on 4.2.13) work as expected.

…s count() Motivation: The generic FileRegion fallback added in netty#16571 drains the region by looping on `buf.writableBytes() > 0` only. epoll's writeFileRegion() guards the same call with `if (region.transferred() >= region.count()) { in.remove(); return 0; }` -- the FileRegion contract permits implementations to assume transferTo() is not invoked past transferred() == count(). Calling transferTo() past completion can trigger side effects in custom FileRegion implementations that lazily emit per-chunk framing -- for example an encryption, compression, or framing layer that flushes a completed inner chunk on each call -- producing a phantom chunk header that lands on the wire and corrupts the receiver's framing. Modification: Tighten the io_uring drain-loop condition with `region.transferred() < region.count()` in addition to `buf.writableBytes() > 0`, mirroring epoll's short-circuit. Add IoUringSocketFileRegionTest#testFileRegionDrainStopsAtCompletion plus an OvershootDetectingFileRegion fixture that records every transferTo() call made past transferred() == count(). The test fails on the prior implementation and passes with this change. Result: io_uring's generic FileRegion path no longer over-calls transferTo(). Custom FileRegion implementations that rely on the documented contract work under io_uring with the same semantics they already have under NIO and epoll.

…tual bytes Allocate the full FILE_REGION_MAX_CHUNK_SIZE for the chunk buffer instead of `min(count() - transferred(), MAX)`. The count()-based sizing assumes that count() is an accurate upper bound on the bytes a FileRegion will write to its target -- this holds for DefaultFileRegion but is violated by encryption-on-write implementations like Spark's SaslEncryption.EncryptedMessage, which: - Returns count() = unencrypted message size (or buf.readableBytes() for its ByteBuf-backed variant, which decreases as the underlying buffer drains). - Emits encrypted output through transferTo() that is LARGER than count() advertises (per-chunk framing + cipher overhead, plus the buf.skipBytes side effect that mutates count() mid-stream). When the chunk buf is sized to count() - transferred() and that's smaller than the encrypted output, transferTo() returns the +1 fallback (advancing transferred by 1 to signal "I have more"), the buf is sent with only some of the encrypted bytes, and the completion handler sees `region.transferred() >= region.count()` (true now that count() has dropped to 0) and removes the entry from the outbound buffer. The unsent bytes still in the FileRegion's internal currentChunk are lost, the peer gets a partial SASL frame, and the connection stalls. Sizing the buffer to FILE_REGION_MAX_CHUNK_SIZE (default 64 KiB) ensures small-payload encrypted writes fit in one chunk, so the +1 fallback never fires and the count() vs transferred() race never matters. For large payloads the buffer caps at MAX_CHUNK_SIZE as before -- no regression in direct-memory headroom. This is the second of two bugs in netty#16571 found while running Spark's SparkSaslSuite under io_uring; the first (drain loop overshooting transferred()==count()) is fixed in the previous commit. Verified: SparkSaslSuite + RpcIntegrationSuite (21 tests covering small RPCs, encrypted RPCs, encrypted FileRegion fetches, multi-message stream uploads up to 512 KiB) pass cleanly across three back-to-back runs in an OrbStack Linux container with --security-opt seccomp=unconfined.

…llback Motivation: The generic FileRegion fallback added in netty#16571 sizes its chunk buffer to `min(region.count() - region.transferred(), FILE_REGION_MAX_CHUNK_SIZE)`. That sizing treats count() as an upper bound on the bytes transferTo() will emit -- which holds for DefaultFileRegion but custom FileRegion implementations may violate it. An on-the-fly encryption or framing layer, for example, exposes count() as the source size while emitting larger output through transferTo() (ciphertext, per-chunk framing, etc.). With a too-small chunk buffer the region's output is truncated; if the implementation then advances transferred() to count() once its source is drained, the completion handler removes the entry from the outbound buffer while the rest of the bytes are still sitting inside the region. The peer sees a partial frame and the connection stalls. Modification: Allocate the full FILE_REGION_MAX_CHUNK_SIZE unconditionally. Add IoUringSocketFileRegionTest#testFileRegionEmittingMoreThanCount plus an EmitMoreThanCountFileRegion fixture that reports count = 5 while transferTo() emits 20 bytes in one call. The receiver-side byte count distinguishes the buggy path (5 bytes received) from the fixed path (20 bytes received). The test skips when the operator has configured a chunk size smaller than the region's per-call output -- the fix is bounded to outputs that fit in one chunk. Result: io_uring's generic FileRegion path no longer truncates the per-call output of custom FileRegion implementations whose actual emit size exceeds count(), so long as the per-call output fits in FILE_REGION_MAX_CHUNK_SIZE (default 64 KiB, tunable via the io.netty.iouring.fileRegionChunkSize system property). Memory cost: every generic-FileRegion send now allocates the full chunk size up front even for tiny regions, but the buffer is per in-flight write and the direct-buffer allocator pools it on release.

…llback Motivation: The generic FileRegion fallback added in netty#16571 sizes its chunk buffer to `min(region.count() - region.transferred(), FILE_REGION_MAX_CHUNK_SIZE)`. That sizing treats count() as an upper bound on the bytes transferTo() will emit -- which holds for DefaultFileRegion but custom FileRegion implementations may violate it. An on-the-fly encryption or framing layer, for example, exposes count() as the source size while emitting larger output through transferTo() (ciphertext, per-chunk framing, etc.). With a too-small chunk buffer the region's output is truncated; if the implementation then advances transferred() to count() once its source is drained, the completion handler removes the entry from the outbound buffer while the rest of the bytes are still sitting inside the region. The peer sees a partial frame and the connection stalls. Modification: Allocate the full FILE_REGION_MAX_CHUNK_SIZE unconditionally. Add IoUringSocketFileRegionTest#testFileRegionEmittingMoreThanCount plus an EmitMoreThanCountFileRegion fixture that reports count = 5 while transferTo() emits 20 bytes in one call. The receiver-side byte count distinguishes the buggy path (5 bytes received) from the fixed path (20 bytes received). The test skips when the operator has configured a chunk size smaller than the region's per-call output -- the fix is bounded to outputs that fit in one chunk. Result: io_uring's generic FileRegion path no longer truncates the per-call output of custom FileRegion implementations whose actual emit size exceeds count(), so long as the per-call output fits in FILE_REGION_MAX_CHUNK_SIZE (default 64 KiB, tunable via the io.netty.iouring.fileRegionChunkSize system property). Memory cost: every generic-FileRegion send now reserves the full chunk size up front even for tiny regions. The buffer is per in-flight write and is released when the write completes; with the default pooled/adaptive allocator it is recycled, while UnpooledByteBufAllocator pays a direct malloc/free per send.

…llback Motivation: The generic FileRegion fallback added in netty#16571 sizes its chunk buffer to `min(region.count() - region.transferred(), FILE_REGION_MAX_CHUNK_SIZE)`. That sizing treats count() as an upper bound on the bytes transferTo() will emit -- which holds for DefaultFileRegion but custom FileRegion implementations may violate it. An on-the-fly encryption or framing layer, for example, exposes count() as the source size while emitting larger output through transferTo() (ciphertext, per-chunk framing, etc.). With a too-small chunk buffer the region's output is truncated; if the implementation then advances transferred() to count() once its source is drained, the completion handler removes the entry from the outbound buffer while the rest of the bytes are still sitting inside the region. The peer sees a partial frame and the connection stalls. Modification: Allocate the full FILE_REGION_MAX_CHUNK_SIZE unconditionally. Add IoUringSocketFileRegionTest#testFileRegionEmittingMoreThanCount plus an EmitMoreThanCountFileRegion fixture that reports count = 5 while transferTo() emits 20 bytes in one call. With the fix applied the receiver observes all 20 bytes; against unpatched code the transport's 5-byte chunk buffer accepts only 5 bytes of the emitted chunk and the fixture's fail-fast short-write check throws IOException, which surfaces on the write future and fails the test. The test skips when the operator has configured a chunk size smaller than the region's per-call output -- the fix is bounded to outputs that fit in one chunk. Result: io_uring's generic FileRegion path no longer truncates the per-call output of custom FileRegion implementations whose actual emit size exceeds count(), so long as the per-call output fits in FILE_REGION_MAX_CHUNK_SIZE (default 64 KiB, tunable via the io.netty.iouring.fileRegionChunkSize system property). Memory cost: every generic-FileRegion send now reserves the full chunk size up front even for tiny regions. The buffer is per in-flight write and is released when the write completes; with the default pooled/adaptive allocator it is recycled, while UnpooledByteBufAllocator pays a direct malloc/free per send.

…llback Motivation: The generic FileRegion fallback added in netty#16571 sizes its chunk buffer to `min(region.count() - region.transferred(), FILE_REGION_MAX_CHUNK_SIZE)`. That sizing treats count() as an upper bound on the bytes transferTo() will emit -- which holds for DefaultFileRegion but custom FileRegion implementations may violate it. An on-the-fly encryption or framing layer, for example, exposes count() as the source size while emitting larger output through transferTo() (ciphertext, per-chunk framing, etc.). With a too-small chunk buffer the region's output is truncated; if the implementation then advances transferred() to count() once its source is drained, the completion handler removes the entry from the outbound buffer while the rest of the bytes are still sitting inside the region. The peer sees a partial frame and the connection stalls. Modification: Allocate the full FILE_REGION_MAX_CHUNK_SIZE unconditionally. Add IoUringSocketFileRegionTest#testFileRegionEmittingMoreThanCount plus an EmitMoreThanCountFileRegion fixture that reports count = 5 while transferTo() emits 20 bytes in one call. With the fix applied the receiver observes all 20 bytes; against unpatched code the transport's 5-byte chunk buffer accepts only 5 bytes of the emitted chunk and the fixture's fail-fast short-write check throws IOException, which surfaces on the write future and fails the test. The test skips when the operator has configured a chunk size smaller than the region's per-call output -- the fix is bounded to outputs that fit in one chunk. Result: io_uring's generic FileRegion path no longer truncates the per-call output of custom FileRegion implementations whose actual emit size exceeds count(), so long as the per-call output fits in FILE_REGION_MAX_CHUNK_SIZE (default 64 KiB, tunable via the io.netty.iouring.fileRegionChunkSize system property). Memory cost: every generic-FileRegion send now reserves the full chunk size up front even for tiny regions. The buffer is per in-flight write and is released when the write completes; with the default pooled/adaptive allocator it is recycled, while UnpooledByteBufAllocator pays a fresh native allocation/free per send.

…llback Motivation: The generic FileRegion fallback added in netty#16571 sizes its chunk buffer to `min(region.count() - region.transferred(), FILE_REGION_MAX_CHUNK_SIZE)`. That sizing treats count() as an upper bound on the bytes transferTo() will emit -- which holds for DefaultFileRegion but custom FileRegion implementations may violate it. An on-the-fly encryption or framing layer, for example, exposes count() as the source size while emitting larger output through transferTo() (ciphertext, per-chunk framing, etc.). With a too-small chunk buffer the region's output is truncated; if the implementation then advances transferred() to count() once its source is drained, the completion handler removes the entry from the outbound buffer while the rest of the bytes are still sitting inside the region. The peer sees a partial frame and the connection stalls. Modification: Allocate the full FILE_REGION_MAX_CHUNK_SIZE unconditionally. Widen the chunk-allocation try/catch from Exception to Throwable so an OutOfDirectMemoryError from alloc().directBuffer(...) -- or any Error from transferTo() -- still releases the chunk buffer (when already allocated) and surfaces on the write future instead of leaking and stalling the channel. Add SocketFileRegionTest#testFileRegionEmittingMoreThanCount plus an EmitMoreThanCountFileRegion fixture that reports count = 5 while transferTo() emits 20 bytes in one call. The test lives on the base class so every FileRegion-supporting transport (NIO, epoll, kqueue, io_uring) must satisfy the same invariant. With the fix applied the receiver observes all 20 bytes; against unpatched io_uring code the transport's 5-byte chunk buffer accepts only 5 bytes of the emitted chunk and the fixture's fail-fast short-write check throws IOException, which surfaces on the write future and fails the test. Introduce SocketFileRegionTest#maxFileRegionPerCallEmit() (default Long.MAX_VALUE) so transports that copy through an intermediate buffer can declare their per-call output cap. The two io_uring subclasses override it to CONFIGURED_CHUNK_SIZE so the test skips cleanly when an operator has shrunk the chunk size below the region's per-call output. Document the per-call-output cap on the FILE_REGION_MAX_CHUNK_SIZE field so operators tuning the system property know not to size it below the largest emit any FileRegion implementation can produce. Result: io_uring's generic FileRegion path no longer truncates the per-call output of custom FileRegion implementations whose actual emit size exceeds count(), so long as the per-call output fits in FILE_REGION_MAX_CHUNK_SIZE (default 64 KiB, tunable via the io.netty.iouring.fileRegionChunkSize system property). The new regression test is shared across all FileRegion-supporting transports. Memory cost: every generic-FileRegion send now reserves the full chunk size up front even for tiny regions. The buffer is per in-flight write and is released when the write completes; with the default pooled/adaptive allocator it is recycled, while UnpooledByteBufAllocator pays a fresh native allocation/free per send.

…s count() (#16826) Closes #16825. ### Motivation The generic `FileRegion` fallback added in #16571 drains the region only on `buf.writableBytes() > 0`: ```java while (buf.writableBytes() > 0) { long t = region.transferTo(ch, region.transferred()); if (t <= 0) { break; } } ``` epoll's equivalent at `AbstractEpollStreamChannel#writeFileRegion` guards the call with: ```java if (region.transferred() >= region.count()) { in.remove(); return 0; } ``` so epoll never invokes `transferTo()` past `transferred() == count()`. The io_uring fallback can, which violates the `FileRegion` contract for implementations that lazily emit per-chunk framing — the extra call builds an empty next chunk and ships its framing into the buffer, breaking the receiver. See #16825 for the full diagnosis. ### Modification Bound the inner drain loop with `region.transferred() < region.count()` in addition to `buf.writableBytes() > 0`, mirroring epoll's short-circuit. Add `IoUringSocketFileRegionTest#testFileRegionDrainStopsAtCompletion` plus an `OvershootDetectingFileRegion` fixture that records every `transferTo()` invocation made past `transferred() == count()`. The fixture writes one byte and advances `transferred` to `count` in one shot, exercising the buggy path (15 overshoots on default chunk size) and the fixed path (0 overshoots) deterministically. A few test-design notes for review: - Skips on non-io_uring client permutations — the drain loop is on the sending side. - Skips when `io.netty.iouring.fileRegionChunkSize` is set to 1, since that leaves no spare capacity for the overshoot to land in. - Asserts on the writer-side counter synchronously after `writeAndFlush(...).sync()`; no time-based wait. - Routes contract violations through `IOException` rather than `AssertionError` so they surface on the write future's cause through the transport's `catch (Exception)` instead of wedging the EventLoop. - `@Timeout(value = 30, unit = SECONDS)` as a safety net for any future regression that wedges `sync()`. ### Result io_uring's generic `FileRegion` path stops over-calling `transferTo()`. Custom `FileRegion` implementations that lazily emit per-chunk framing work under io_uring with the same semantics they already have under NIO and epoll.

#16832) …s count() (#16826) Closes #16825. The generic `FileRegion` fallback added in #16571 drains the region only on `buf.writableBytes() > 0`: ```java while (buf.writableBytes() > 0) { long t = region.transferTo(ch, region.transferred()); if (t <= 0) { break; } } ``` epoll's equivalent at `AbstractEpollStreamChannel#writeFileRegion` guards the call with: ```java if (region.transferred() >= region.count()) { in.remove(); return 0; } ``` so epoll never invokes `transferTo()` past `transferred() == count()`. The io_uring fallback can, which violates the `FileRegion` contract for implementations that lazily emit per-chunk framing — the extra call builds an empty next chunk and ships its framing into the buffer, breaking the receiver. See #16825 for the full diagnosis. Bound the inner drain loop with `region.transferred() < region.count()` in addition to `buf.writableBytes() > 0`, mirroring epoll's short-circuit. Add `IoUringSocketFileRegionTest#testFileRegionDrainStopsAtCompletion` plus an `OvershootDetectingFileRegion` fixture that records every `transferTo()` invocation made past `transferred() == count()`. The fixture writes one byte and advances `transferred` to `count` in one shot, exercising the buggy path (15 overshoots on default chunk size) and the fixed path (0 overshoots) deterministically. A few test-design notes for review: - Skips on non-io_uring client permutations — the drain loop is on the sending side. - Skips when `io.netty.iouring.fileRegionChunkSize` is set to 1, since that leaves no spare capacity for the overshoot to land in. - Asserts on the writer-side counter synchronously after `writeAndFlush(...).sync()`; no time-based wait. - Routes contract violations through `IOException` rather than `AssertionError` so they surface on the write future's cause through the transport's `catch (Exception)` instead of wedging the EventLoop. - `@Timeout(value = 30, unit = SECONDS)` as a safety net for any future regression that wedges `sync()`. io_uring's generic `FileRegion` path stops over-calling `transferTo()`. Custom `FileRegion` implementations that lazily emit per-chunk framing work under io_uring with the same semantics they already have under NIO and epoll. Co-authored-by: YangJie <[email protected]>

fix

1a87944

fix lint

79b960c

LuciferYang mentioned this pull request Mar 30, 2026

[SPARK-56279][CORE] Enable zero-copy sendfile for FileRegion in native Netty transports apache/spark#55087

Closed

normanmaurer added this to the 4.2.13.Final milestone Mar 30, 2026

normanmaurer added the needs-cherry-pick-5.0 This PR should be cherry-picked to 5.0 once merged. label Mar 30, 2026

normanmaurer reviewed Mar 30, 2026

View reviewed changes

Comment thread ...port-classes-io_uring/src/main/java/io/netty/channel/uring/AbstractIoUringStreamChannel.java

normanmaurer requested changes Mar 30, 2026

View reviewed changes

LuciferYang added 2 commits March 30, 2026 21:34

address comments

242c206

refine comments

f35d0cd

normanmaurer requested changes Mar 30, 2026

View reviewed changes

LuciferYang added 2 commits March 30, 2026 22:22

address comments

279e679

more

15a3d7e

LuciferYang commented Mar 30, 2026

View reviewed changes

normanmaurer requested a review from chrisvest March 30, 2026 16:32

Merge branch '4.2' into fix-iouring-fileregion-fallback

946a55f

normanmaurer approved these changes Apr 17, 2026

View reviewed changes

chrisvest reviewed Apr 17, 2026

View reviewed changes

LuciferYang force-pushed the fix-iouring-fileregion-fallback branch from 276cbfa to 28c2be7 Compare April 17, 2026 05:48

address comments

9db54a0

LuciferYang force-pushed the fix-iouring-fileregion-fallback branch from 28c2be7 to 9db54a0 Compare April 17, 2026 05:54

chrisvest approved these changes Apr 20, 2026

View reviewed changes

fix test

bc2c593

normanmaurer merged commit 8204af4 into netty:4.2 Apr 21, 2026
19 checks passed

chrisvest removed the needs-cherry-pick-5.0 This PR should be cherry-picked to 5.0 once merged. label Apr 21, 2026

This was referenced May 19, 2026

IoUring: generic FileRegion drain loop calls transferTo() past transferred() == count() #16825

Closed

IoUring: Stop generic FileRegion drain loop when transferred() reaches count() #16826

Merged

This was referenced May 19, 2026

io_uring generic-FileRegion fallback truncates output when transferTo() emits more bytes than count() #16829

Open

IoUring: Allocate the full chunk buffer for the generic FileRegion fallback #16830

Closed

normanmaurer mentioned this pull request May 19, 2026

IoUring: Stop generic FileRegion drain loop when transferred() reache… #16832

Merged

Uh oh!

Conversation

LuciferYang commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Result

Uh oh!

LuciferYang commented Mar 27, 2026

Uh oh!

normanmaurer commented Mar 30, 2026

Uh oh!

Uh oh!

LuciferYang commented Mar 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dreamlike-ocean commented Mar 31, 2026

Uh oh!

LuciferYang commented Mar 31, 2026

Uh oh!

normanmaurer commented Apr 1, 2026

Uh oh!

LuciferYang commented Apr 8, 2026

Uh oh!

chrisvest left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Apr 17, 2026

Uh oh!

LuciferYang commented Apr 21, 2026

Uh oh!

Uh oh!

netty-project-bot commented Apr 21, 2026

Uh oh!

LuciferYang commented Apr 21, 2026

Uh oh!

LuciferYang commented Apr 21, 2026

Uh oh!

normanmaurer commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LuciferYang commented Mar 27, 2026 •

edited

Loading