Skip to content

core: delay sending cancel request on client-side when deadline expires#6328

Merged
ran-su merged 19 commits intogrpc:masterfrom
ran-su:race-ddl-cancel
Dec 16, 2019
Merged

core: delay sending cancel request on client-side when deadline expires#6328
ran-su merged 19 commits intogrpc:masterfrom
ran-su:race-ddl-cancel

Conversation

@ran-su
Copy link
Copy Markdown
Member

@ran-su ran-su commented Oct 23, 2019

When deadline expires, both the client and the server try to cancel the stream. There is a race between server receiving the cancellation from the client and the server cancelling the stream, which changes the final status of the stream from the server's perspective. If the former wins, server sees CANCELLED. If the latter wins, server sees DEADLINE_EXCEEDED.

Because ServerCall.Listener doesn't pass the final status to the server-side application, this ambiguity isn't a problem in most cases. However, the status is passed to ServerStreamTracer, and thus will cause confusion in stats.

This PR delays the client-side cancellation in case of deadline exceeded by a small margin, thus makes DEADLINE_EXCEEDED be reported most of the time.

b/118879795

@ran-su ran-su requested a review from zhangkun83 October 23, 2019 22:32
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java Outdated
@zhangkun83 zhangkun83 requested a review from ejona86 October 24, 2019 17:35
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
@ran-su ran-su requested a review from zhangkun83 October 24, 2019 22:19
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
@ran-su ran-su requested a review from zhangkun83 October 25, 2019 23:41
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
@ran-su ran-su requested a review from zhangkun83 October 29, 2019 21:29
Copy link
Copy Markdown
Contributor

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also handle the case where the

Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java
@ran-su ran-su requested a review from zhangkun83 November 1, 2019 21:35
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ManagedChannelImplTest.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java Outdated
Comment thread core/src/test/java/io/grpc/internal/ClientCallImplTest.java
@ran-su ran-su requested a review from zhangkun83 November 8, 2019 21:25
Copy link
Copy Markdown
Contributor

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ejona86 it's ready for you to review now.

Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated
stream.cancel(statusFromCancelled(context));
} else {
if (deadlineCancellationSendToServerFuture == null) {
deadlineCancellationSendToServerFuture = deadlineCancellationExecutor.schedule(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything here that triggers the application to be called immediately when the deadline is exceeded. Won't this delay all deadline notifications to the application by 1 second?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All notifications to the application are sent in startDeadlineNotifyApplicationTimer(), which follows the initial deadline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case calloption's deadline equals to the context deadline, startDeadlineNotifyApplicationTimer() is not called when the ClientCall is started.

deadlineCancellationNotifyApplicationFuture =
startDeadlineNotifyApplicationTimer(effectiveDeadline, observer);
} else {
if (cancellationListener instanceof ClientCallImpl.ContextCancellationListener) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always the case. You can change cancellationListener's type to ContextCancellationListener to avoid the casting.

startDeadlineNotifyApplicationTimer(effectiveDeadline, observer);
} else {
if (cancellationListener instanceof ClientCallImpl.ContextCancellationListener) {
((ContextCancellationListener) cancellationListener).setObserver(observer);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to move line 320 (context.addListener()) to after this else-block, so that if the deadline is exceeded immediately, observer can still be notified.

Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java
if (observer != null) {
Status status = statusFromCancelled(context);
deadlineCancellationSendToServerFuture = startDeadlineSendCancelToServerTimer(status);
executeCloseObserverInContext(observer, status);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are repeated in DeadlineExceededNotifyApplicationTimer. Since this is the core logic of delayed cancellation, it should be in its own method (delayedCancelOnDeadlineExceeded(status, observer)), and the two cases just call it.

}

callExecutor.execute(new CloseInContext());
private void executeCloseObserverInContext(final Listener<RespT> observer, final Status status) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other two call sites of closeObserver() at the start of startInternal() that can be simplified by calling this method.

Copy link
Copy Markdown
Contributor

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejona86. I believe @ran-su has addressed your last comment. You can take another look.

TimeUnit.NANOSECONDS);
}

private Status buildDeadlineExceededStatusWithRemainingNanos(long remainingNanos) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this method is used only once, and the only caller is very short. It may be more readable as an inline codeblock at the call site.

}
}

private ScheduledFuture<?> startDeadlineSendCancelToServerTimer(final Status status) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now that there is only one caller of this method, it would be more readable to merge it into its caller.

// cancelling the stream, which changes the final status of the stream from the server's
// perspective.
// Mitigate this by introduce a delay for client to send the cancel request.(b/118879795)
// This only impacts server statistics.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language here is a bit fragmented. How about

When a deadline is exceeded, there is a race between the server receiving the cancellation from the client and the server cancelling the stream itself. If the client's cancellation is received first, then the stream's status will be CANCELLED instead of DEADLINE_EXCEEDED. This prevents server monitoring from noticing high rate of DEADLINE_EXCEEDED, a common monitoring metric (b/118879795). Mitigate this by delayed sending of the client's cancellation.

@Override
public void cancelled(Context context) {
stream.cancel(statusFromCancelled(context));
public void cancelled(final Context context) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove final (revert change)

}
}

@SuppressWarnings("unchecked")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

if (context.getDeadline() == null || !context.getDeadline().isExpired()) {
stream.cancel(statusFromCancelled(context));
} else {
if (observer != null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very comfortable with this doing nothing if observer is null. It looks like an optimization, but it requires multiple pieces to behave in a particular way. I think it would be better to add the optimization to delayedCancelOnDeadlineExceeded(): if deadlineCancellationSendToServerFuture is already non-null (the method has already been called), return early. Note there's a race there, but there's a race there already (and so I'll have another comment about it elsewhere).

Changing this would also mean we could pass the Listener in the constructor and get rid of setObserver().

Comment thread core/src/main/java/io/grpc/internal/ClientCallImpl.java
Copy link
Copy Markdown
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important change, but should be easy.

deadlineCancellationNotifyApplicationFuture =
startDeadlineNotifyApplicationTimer(effectiveDeadline, observer);
}
cancellationListener = new ContextCancellationListener(observer);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is thread-safe, since this field is read in removeContextListenerAndCancelDeadlineFuture(). I think you need to just move this initialization up higher before stream.start().

@ran-su ran-su merged commit f6544bf into grpc:master Dec 16, 2019
@ran-su ran-su deleted the race-ddl-cancel branch December 16, 2019 17:58
@lock lock Bot locked as resolved and limited conversation to collaborators Mar 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants