Conversation
|
Friendly ping - this still needs to be merged, right? |
markdroth
left a comment
There was a problem hiding this comment.
Sorry for the delay.
This looks good. In practice, it seems like most callers of SetNotifyOnCancel() need to hold a ref to the call stack for their cancellation objects anyway; I suspect that the auth filters are actually the exception here, not the rule.
We should brainstorm about next steps here. Independently of this, I've been starting to think that the call combiner cancellation mechanism may be too heavy-weight. Most callers of SetNotifyOnCancel() wind up needing a mutex for synchronization anyway, but the whole point of the call combiner was to prevent individual filters from needing mutexes, so it's clearly not completely succeeding in its goal here.
One possible alternative would be to move the cancel_stream op out of grpc_transport_stream_op_batch and replace it with a new cancel_stream() method on the filter API, which would basically work the same way as start_transport_stream_op_batch() except that it can be invoked without holding the call combiner. That way, filters may still need mutexes to handle cancellation, but they would not need any synchronization for registering a cancel callback with the call combiner itself.
A more radical approach might be to reconsider whether the call combiner as a whole really still makes sense. Part of the original idea of it was that we thought that there were going to be a lot more filters added over time, so it made sense to amortize the synchronization cost by doing it at a central point instead of having each individual filter do it. But since we are now expecting most plugins to be done as interceptors at the wrapped language level instead of core filters, we should reevaluate whether this trade-off still makes sense.
Anyway, we should talk further about this to figure out next steps, but this PR is a good first step.
Yes, I think this is doable, and I might make the code easier to understand. |
|
looks like this has exposed a bug in the fault injection filter :) |
Actually, thinking about this further, we may want to do both -- i.e., keep the But anyway, we can talk about this more when you're ready to implement it. |
|
Sure! Thanks for reviewing! |
#16288 added
ExecCtx::Flush()to CallCombiner sites whereSetNotifyOnCancel()was invoked but did not add it to places whereCancel()was invoked. This leads to a bug where when RPCs are cancelled, we invoke the call combiner cancellation closure but we do not flush ExecCtx, which can cause SEGFAULTs when the cancellation closure is actually invoked.To fix this we could also add ExecCtx::Flush to sites where we do a
Cancel()but sprinklingExecCtx::Flush()across the codebase and expecting future modifications to remember to flush it seems like a recipe for future bugs. A better way to do this would be to simply change CallCombiner to use Closure::Run instead of ExecCtx::Run. Reverting #16288 firstInternal context - b/179905578