netty: http2 server transport graceful shutdown sends 2 GOAWAYs#4227
netty: http2 server transport graceful shutdown sends 2 GOAWAYs#4227dapengzhang0 merged 5 commits intogrpc:masterfrom
Conversation
| decoder().flowController().initialWindowSize(connection().connectionStream()))); | ||
| } | ||
| } else if (data == MAX_CONNECTION_AGE_PING) { | ||
| checkNotNull(maxAgeShutdownRunner, "maxAgeShutdownRunner"); |
There was a problem hiding this comment.
We should not throw a normal runtime exception in response to something received (we could throw a Http2Exception though). Logging a warning would be fine though.
There was a problem hiding this comment.
It will never throw a RuntimeException unless there's a bug in the code. If there's a bug in the code, it will throw NPE anyway even if checkNotNull is not there. The checkNotNull here is just an assert without -ea flag. checkNotNull and assert can help find a bug, but logging a warning may not be that helpful. There are checkNotNull even in the constructors of Http2Exception.
There was a problem hiding this comment.
Then make it an assert or manually throw new AssertionError(). checkNotNull is best used when verifying your input, and telling the caller they made a mistake.
| @CheckForNull | ||
| private GracefulShutdownRunner maxAgeShutdownRunner; | ||
| @CheckForNull | ||
| private GracefulShutdownRunner maxIdleShutdownRunner; |
There was a problem hiding this comment.
Don't use two separate flows for the two different causes for shutdown. It doesn't matter why we are shutting down. If we're already shutting down and we want to shutdown again, then do nothing.
|
|
||
| private final class GracefulShutdownRunner { | ||
|
|
||
| ChannelHandlerContext ctx; |
There was a problem hiding this comment.
I see no reason to save ctx here. Pass it in explicitly to each method.
| private final class GracefulShutdownRunner { | ||
|
|
||
| ChannelHandlerContext ctx; | ||
| long payload; |
| } | ||
| } | ||
|
|
||
| private final class GracefulShutdownRunner { |
There was a problem hiding this comment.
Why is this a "runner"? It seems like run is just being used as a generic "do" method. Instead, give it a better name, like start (because it won't actually finish when it returns) or sendGoAway.
Even more though, I think the run() shouldn't be on this class. Instead, make a method directly in NettyServerHandler like gracefulShutdown or startGracefulShutdown. It would issue the initial GOAWAY and save state (this class) for later.
There was a problem hiding this comment.
Removed Runner in name. Renamed run() by start().
| Http2Error.NO_ERROR.code(), | ||
| ByteBufUtil.writeAscii(ctx.alloc(), goAwayMessage), | ||
| ctx.newPromise()); | ||
| ctx.flush(); |
There was a problem hiding this comment.
This flush is unnecessary. Another is following it.
| TimeUnit.NANOSECONDS); | ||
|
|
||
| encoder().writePing(ctx, false /* isAck */, payload, ctx.newPromise()); | ||
| ctx.flush(); |
There was a problem hiding this comment.
I'd prefer the caller do the flush. The only time explicit flushes are necessary here is for writes due to timers. For writes caused by reads, we do flush in onReadComplete. For writes coming from the application, it ends up flushing.
|
|
||
| long gracefulShutdownPingTimeout = GRACEFUL_SHUTDOWN_PING_TIMEOUT_NANOS; | ||
| if (graceTimeInNanos != null) { | ||
| deadline = ticker.read() + graceTimeInNanos; |
There was a problem hiding this comment.
Count grace time starting at secondGoAwayAndClose(), since we'll still be accepting new requests until that point.
With that change, it also seems the ticker would no longer be necessary.
| Http2Error.NO_ERROR.code(), | ||
| ByteBufUtil.writeAscii(ctx.alloc(), goAwayMessage), | ||
| ctx.newPromise()); | ||
| ctx.flush(); |
There was a problem hiding this comment.
Ditto remove. You only need the flush in the gracefulShutdownPingTimeout handling (not for the ping receiving).
| long savedGracefulShutdownTimeMillis = gracefulShutdownTimeoutMillis(); | ||
| long gracefulShutdownTimeoutMillis = savedGracefulShutdownTimeMillis; | ||
| if (graceTimeInNanos != null) { | ||
| gracefulShutdownTimeoutMillis = TimeUnit.NANOSECONDS.toMillis(deadline - ticker.read()); |
There was a problem hiding this comment.
Why is this consulting deadline? I thought we agreed that graceTimeInNanos would be relative to the second GOAWAY.
There was a problem hiding this comment.
This was a partial fix. Removing ticker is in the upcoming commit.
|
@ejona86 Thanks for the review. PTAL. |
| decoder().flowController().initialWindowSize(connection().connectionStream()))); | ||
| } | ||
| } else if (data == GRACEFUL_SHUTDOWN_PING) { | ||
| if (gracefulShutdown == null) { |
There was a problem hiding this comment.
Oh, I remember why I put that comment. This is doing a check based on what is received. In your earlier response you said:
It will never throw a RuntimeException unless there's a bug in the code.
But the bug could be in the remote code, not in this code. That's why I had suggested a warning.
resolves #3442