Skip to content

Fixes race condition in Python server shutdown#13786

Merged
kpayson64 merged 1 commit intogrpc:v1.8.xfrom
kpayson64:fix_python_server_race
Dec 14, 2017
Merged

Fixes race condition in Python server shutdown#13786
kpayson64 merged 1 commit intogrpc:v1.8.xfrom
kpayson64:fix_python_server_race

Conversation

@kpayson64
Copy link
Copy Markdown
Contributor

@kpayson64 kpayson64 commented Dec 14, 2017

When we set the call state to "CANCELLED" after
grpc_cancel_all_calls, we would block other start batch
operations from happening. The rpc_state for the cancelled
call would still be in the server's rpc_states set, but it
would never get removed because there were no active batches
for the call, and the only place we remove from rpc_states is
when a batch completes.

It is better to rely on c-core's cancellation. Once a call
is cancelled, all subsequent ops on that call will return
immediately with a cancellation error.

The "RLock()" change is due to the possibility that _on_call_completed
gets invoked immediately when the call has already completed when the
rpc_future callback is created.

Fixes #12668

Copy link
Copy Markdown
Contributor

@nathanielmanistaatgoogle nathanielmanistaatgoogle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content looks good, and hooray for code deletion and fulfilled TODOs.

In your commit log message: if the threading.RLock introduced in this commit is "second", which one is "first"?

Also: missing apostrophe in "be to rely on c-cores cancellation".

@nathanielmanistaatgoogle
Copy link
Copy Markdown
Contributor

How sure are we that master is the right choice for this and not v1.8.x?

@kpayson64 kpayson64 changed the base branch from master to v1.8.x December 14, 2017 19:59
@kpayson64 kpayson64 force-pushed the fix_python_server_race branch from 84491b0 to 67fe3b0 Compare December 14, 2017 20:06
When we set the call state to "CANCELLED" after
grpc_cancel_all_calls, we would block other start batch
operations from happening. The rpc_state for the cancelled
call would still be in the server's rpc_states set, but it
would never get removed because there were no active batches
for the call, and the only place we remove from rpc_states is
when a batch completes.

It is better to rely on c-core's cancellation. Once a call
is cancelled, all subsequent ops on that call will return
immediately with a cancellation error.

The RLock() change is due to the possibility that
_on_call_completed
gets invoked immediately when the call has already completed when the
rpc_future callback is created.
@kpayson64 kpayson64 force-pushed the fix_python_server_race branch from 67fe3b0 to 8179b95 Compare December 14, 2017 20:07
@kpayson64
Copy link
Copy Markdown
Contributor Author

I've updated the commit message and put this on 1.8.x

Before 16/1000 timeouts
After 0/1000 timeouts

Copy link
Copy Markdown
Contributor

@nathanielmanistaatgoogle nathanielmanistaatgoogle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much!

@kpayson64 kpayson64 merged commit f5649c7 into grpc:v1.8.x Dec 14, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants