Fix kevent(..) failed: Invalid argument#13503
Conversation
Motivation: While I was investating Armeria CI build failures on macOS, many `kevent(..) failed: Invalid argument` error messages were produced on the failed tests. https://ge.armeria.dev/s/h2fz4h4th66we/tests/task/:core:shadedTest/details/com.linecorp.armeria.server.HttpServerNoKeepAliveTest/shouldDisconnectWhenConnectionCloseIsIncluded(SessionProtocol)%5B4%5D?focused-execution=1&top-execution=1#L7 After adding some debug messages, I figured out that the error raised when a large number such `Integer.MAX_VALUE` is set as a timeout. The long timeout is derived from a scheduled task whose timeout is `Long.MAX_VALUE`. We set `Long.MAX_VALUE` to `DnsNameResolver` to disable a query timeout. https://github.com/netty/netty/blob/cbc995865ed1e082d5581744e9c90c6d119652b1/resolver-dns/src/main/java/io/netty/resolver/dns/DnsQueryContext.java#L282-L293 https://github.com/line/armeria/blob/2bfe424b652a16f603e2ba1ca075bf992f80f24e/core/src/test/java/com/linecorp/armeria/internal/client/dns/DefaultDnsResolverTest.java#L91-L93 Failed to find a specification for the timeout value of `kevent()`. However, there was a note in a man page that the timeout value is limited to 24 hours. https://man.freebsd.org/cgi/man.cgi?query=kevent&apropos=0&sektion=0&manpath=FreeBSD+6.1-RELEASE&format=html#end Modifications: - Limit the maximum allowed timeout for `kevent()` to 24 hours. - Skip to schedule a query timeout when the value is `Long.MAX_VALUE` Result: `kevent(..) failed: Invalid argument` no longer raises when a task is scheduled with a large value.
|
Please let me know if there is a good way to test these changes. |
resolver-dns/src/main/java/io/netty/resolver/dns/DnsQueryContext.java
Outdated
Show resolved
Hide resolved
transport-classes-kqueue/src/main/java/io/netty/channel/kqueue/KQueueEventLoop.java
Show resolved
Hide resolved
|
This is the build result run with the change in this PR. |
Motivation: The build failures on macOS had two issues. 1. A deadlock on the JaCoCo task that prevented tests from running on macOS. 2. Blocked event loops that were caused by `kevent(..) Invalid Argument`. https://github.com/netty/netty/blob/830166292be28e64ec6b43170cff40a31a442ad4/transport-classes-kqueue/src/main/java/io/netty/channel/kqueue/KQueueEventLoop.java#L399-L405 A patch for the bug was submmited to netty/netty#13503 Modifications: - Fixed DNS builder and tests to use `-1` in order to disable a query timeout. Long.MAX_VALUE could causes `kevent(..) Invalid Argument`. - Nomarilized `Long.MAX_VALUE` to `-1` - Removed a depedency between `reportTask` and `testTask.path` that caused the deadlock with the current Gradle version. The dependency looked redundant. The JaCoCo task runs without it. - Upgrade deprecated Codecov GitHub Actions plugin. The old version is no longer working. - Miscellaneous) Added debug message to `AbstractServerCall` to find out the cause of line#5035 Result: - Code coverage is correctly reported by Codecov. - You no longer see `kevent(..) Invalid Argument` when testing Armeria on macOS.
| int delaySeconds = (int) min(totalDelay / 1000000000L, Integer.MAX_VALUE); | ||
| return kqueueWait(delaySeconds, (int) min(totalDelay - delaySeconds * 1000000000L, Integer.MAX_VALUE)); | ||
| int delaySeconds = (int) min(totalDelay / 1000000000L, KQUEUE_MAX_TIMEOUT_SECONDS); | ||
| int delayNanos = (int) (totalDelay % 1000000000L); |
There was a problem hiding this comment.
Don't we also need to subtract the delaySeconds ?
There was a problem hiding this comment.
Agree, it seems a typo here
There was a problem hiding this comment.
Doesn't the remainder of modular operation (% 1000000000L) become the nanos' part of totalDelay? Correct me if I'm wrong.
There was a problem hiding this comment.
Wrote a unit test and it passed.
@Test
void testNanos() {
final long nanoTime = System.nanoTime();
final long seconds = nanoTime / 1_000_000_000L;
final long nanos1 = nanoTime % 1_000_000_000L;
final long nanos2 = nanoTime - seconds * 1_000_000_000L;
assertThat(nanos2).isEqualTo(nanos1);
}There was a problem hiding this comment.
This is fine because KQUEUE_MAX_TIMEOUT_SECONDS is 24 hours minus 1 second, and the nanosecond component cannot exceed one second. Keeping the total under 24 hours.
| int delaySeconds = (int) min(totalDelay / 1000000000L, Integer.MAX_VALUE); | ||
| return kqueueWait(delaySeconds, (int) min(totalDelay - delaySeconds * 1000000000L, Integer.MAX_VALUE)); | ||
| int delaySeconds = (int) min(totalDelay / 1000000000L, KQUEUE_MAX_TIMEOUT_SECONDS); | ||
| int delayNanos = (int) (totalDelay % 1000000000L); |
There was a problem hiding this comment.
This is fine because KQUEUE_MAX_TIMEOUT_SECONDS is 24 hours minus 1 second, and the nanosecond component cannot exceed one second. Keeping the total under 24 hours.
Motivation: The build failures on macOS had two issues. 1. A deadlock on the JaCoCo task that prevented tests from running on macOS. `-Pcoverage` option was only used on macOS for CI builds. 2. Blocked event loops that were caused by `kevent(..) Invalid Argument`. https://github.com/netty/netty/blob/830166292be28e64ec6b43170cff40a31a442ad4/transport-classes-kqueue/src/main/java/io/netty/channel/kqueue/KQueueEventLoop.java#L399-L405 A patch for the bug was submitted to netty/netty#13503 Modifications: - Fixed DNS builder and tests to use `-1` in order to disable a query timeout. Long.MAX_VALUE could cause `kevent(..) Invalid Argument`. - Removed a dependency between `reportTask` and `testTask.path` that caused the deadlock with the current Gradle version. It seemed the dependency was redundant. The JaCoCo task now runs without it. - Upgrade deprecated Codecov GitHub Actions plugin. The old version is no longer working. - Miscellaneous) Added debug message to `AbstractServerCall` to find out the cause of #5035 Result: - Code coverage is correctly reported by Codecov. - You no longer see `kevent(..) Invalid Argument` when testing Armeria on macOS.
Motivation: While I was investigating Armeria CI build failures on macOS, many `kevent(..) failed: Invalid argument` error messages were produced on the failed tests. https://ge.armeria.dev/s/h2fz4h4th66we/tests/task/:core:shadedTest/details/com.linecorp.armeria.server.HttpServerNoKeepAliveTest/shouldDisconnectWhenConnectionCloseIsIncluded(SessionProtocol)%5B4%5D?focused-execution=1&top-execution=1#L7 After adding some debug messages, I figured out that the error raised when a large number such as `Integer.MAX_VALUE` is set as a timeout. https://github.com/netty/netty/blob/cbc995865ed1e082d5581744e9c90c6d119652b1/transport-classes-kqueue/src/main/java/io/netty/channel/kqueue/KQueueEventLoop.java#L169-L171 The long timeout is derived from a scheduled task whose timeout is `Long.MAX_VALUE`. We set `Long.MAX_VALUE` to `DnsNameResolver` in order to disable a query timeout for testing. https://github.com/netty/netty/blob/cbc995865ed1e082d5581744e9c90c6d119652b1/resolver-dns/src/main/java/io/netty/resolver/dns/DnsQueryContext.java#L282-L293 https://github.com/line/armeria/blob/2bfe424b652a16f603e2ba1ca075bf992f80f24e/core/src/test/java/com/linecorp/armeria/internal/client/dns/DefaultDnsResolverTest.java#L91-L93 Failed to find a specification for the timeout value of `kevent()`. However, there was a note on a man page that says the timeout value is limited to 24 hours. https://man.freebsd.org/cgi/man.cgi?query=kevent&apropos=0&sektion=0&manpath=FreeBSD+6.1-RELEASE&format=html#end Modifications: - Limit the maximum allowed timeout for `kevent()` to 24 hours. - Skip scheduling a query timeout when the value is `Long.MAX_VALUE` Result: `kevent(..) failed: Invalid argument` no longer raises when a task is scheduled with a large value.
Motivation:
While I was investigating Armeria CI build failures on macOS, many
kevent(..) failed: Invalid argumenterror messages were produced on the failed tests.https://ge.armeria.dev/s/h2fz4h4th66we/tests/task/:core:shadedTest/details/com.linecorp.armeria.server.HttpServerNoKeepAliveTest/shouldDisconnectWhenConnectionCloseIsIncluded(SessionProtocol)%5B4%5D?focused-execution=1&top-execution=1#L7
After adding some debug messages, I figured out that the error raised when a large number such as
Integer.MAX_VALUEis set as a timeout.netty/transport-classes-kqueue/src/main/java/io/netty/channel/kqueue/KQueueEventLoop.java
Lines 169 to 171 in cbc9958
The long timeout is derived from a scheduled task whose timeout is
Long.MAX_VALUE. We setLong.MAX_VALUEtoDnsNameResolverin order to disable a query timeout for testing.netty/resolver-dns/src/main/java/io/netty/resolver/dns/DnsQueryContext.java
Lines 282 to 293 in cbc9958
Failed to find a specification for the timeout value of
kevent(). However, there was a note on a man page that says the timeout value is limited to 24 hours.https://man.freebsd.org/cgi/man.cgi?query=kevent&apropos=0&sektion=0&manpath=FreeBSD+6.1-RELEASE&format=html#end
Modifications:
kevent()to 24 hours.Long.MAX_VALUEResult:
kevent(..) failed: Invalid argumentno longer raises when a task is scheduled with a large value.