Conversation
|
|
|
|
|
|
|
|
|
|
Ping 😅 |
|
cc @mehrdada |
|
|
|
| &chand->max_idle_timer, | ||
| grpc_core::ExecCtx::Get()->Now() + chand->max_connection_idle, | ||
| &chand->close_max_idle_channel); | ||
| gpr_atm_no_barrier_store(&chand->last_enter_idle_time_millis, |
There was a problem hiding this comment.
Shouldn't this be set when you actually change the idle_state to MAX_IDLE_STATE_TIMER_SET ?
There was a problem hiding this comment.
Thanks a lot for the review!
The MAX_IDLE_STATE_TIMER_SET branch is using grpc_core::ExecCtx::Get()->Now() inside grpc_timer_init() as last_enter_idle_time_millis. This stored last_enter_idle_time_millis is prepared for the MAX_IDLE_STATE_SEEN_EXIT_IDLE branch.
By setting it beforehand and using the release semantic, we can make sure if a thread sees MAX_IDLE_STATE_SEEN_ENTER_IDLE, it must see the correct last_enter_idle_time_millis.
| GRPC_LOG_IF_ERROR("close_max_idle_channel", error); | ||
| while (true) { | ||
| gpr_atm idle_state = gpr_atm_acq_load(&chand->idle_state); | ||
| if (idle_state == MAX_IDLE_STATE_TIMER_SET) { |
There was a problem hiding this comment.
Might help to rewrite this as a case statement to be consistent with the other two places..
| grpc_connectivity_state connectivity_state; | ||
| /* Number of active calls */ | ||
| gpr_atm call_count; | ||
| /* 'idle_state' holds the states of max_idle_timer and channel idleness. |
There was a problem hiding this comment.
Thanks for the detailed comment here. If you don't mind, it would help if you write a brief description of what each state means..
especially the MAX_IDLE_STATE_SEEN_EXIT_IDLE and MAX_IDLE_STATE_SEEN_ENTER_IDLE .
|
Anything blocking this from being merged? I'm asking since the backport already on 1.9.x and will come to master on the next upmerge and will cause a conflict if there are additional changes on this one, so clicking the merge button before the upmerge might be a good idea? |
|
y-zeng, I tested the fix with grpc 1.9.0 upgrade. Unfortunately, enabling the option now causes a massive memory leak instead of crashing which is also bad. I am bombarding my server with an rpc and here is a log on Windows (I am able to reproduce the leak on Linux as well). Would you like a new issue created for it or should this really be addressed as part of this issue? Thanks. |
|
@mehrdada There are a few differences between this PR and the one merged in 1.9.x. Maybe we can just keep the changes in the up-merge and close this one? |
|
@abaldeva Thanks for reporting the memory leak issue! I'll try reproducing it on 1.9.0 and see if we need to create a new issue to track it. |
|
@y-zeng , did the log file I added help you in seeing the source of the leak? Thanks. |
|
@y-zeng totally missed your comment here. I have upmerged changes to master last week or so. Please take a quick look at current master and feel free to close this if it is satisfactory. |
|
Created a new issue for the memory leak here - #14447 |
The previous implementation is faulty:
Assume we have one active call.
decrease_call_countwill be invoked,gpr_atm_full_fetch_adddecreasescall_countfrom 1 to 0.increase_call_countis called,gpr_atm_full_fetch_addsees thatcall_countis 0. It then cancels the timer.decrease_call_countthen continues its operation, and usesgrpc_timer_initto set up the timer.After these operations, there is still an active call, but the
max_idle timeris set. At the time this remaining call ends, it will cause the crash reported in #12257.The new implementation does not cancel
max_idle_timer(unless the channel is shutting down). Instead, it usesidle_stateto record ifmax_idle_timeris still valid. If the timer callback is called when themax_idle_timeris valid (i.e.idle_stateisMAX_IDLE_STATE_TIMER_SET), the channel will be closed due to idleness, otherwise the channel won't be changed.