Skip to content

Refactor max idle filter#13594

Closed
y-zeng wants to merge 6 commits intogrpc:masterfrom
y-zeng:max_age_fix
Closed

Refactor max idle filter#13594
y-zeng wants to merge 6 commits intogrpc:masterfrom
y-zeng:max_age_fix

Conversation

@y-zeng
Copy link
Copy Markdown
Contributor

@y-zeng y-zeng commented Dec 1, 2017

The previous implementation is faulty:
Assume we have one active call.

  • When it ends, decrease_call_count will be invoked, gpr_atm_full_fetch_add decreases call_count from 1 to 0.
  • At this time, a new call arrives, increase_call_count is called, gpr_atm_full_fetch_add sees that call_count is 0. It then cancels the timer.
  • decrease_call_count then continues its operation, and uses grpc_timer_init to set up the timer.

After these operations, there is still an active call, but the max_idle timer is set. At the time this remaining call ends, it will cause the crash reported in #12257.

The new implementation does not cancel max_idle_timer (unless the channel is shutting down). Instead, it uses idle_state to record if max_idle_timer is still valid. If the timer callback is called when the max_idle_timer is valid (i.e. idle_state is MAX_IDLE_STATE_TIMER_SET), the channel will be closed due to idleness, otherwise the channel won't be changed.

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                              FILE SIZE
 ++++++++++++++ GROWING                                                                ++++++++++++++
  [ = ]       0 [None]                                                                 +1.37Ki  +0.0%
   +16%    +432 src/core/ext/filters/max_age/max_age_filter.cc                            +432   +16%
      [NEW]    +429 max_idle_timer_cb                                                         +429  [NEW]
      [NEW]    +139 increase_call_count(grpc_exec_ctx*, channel_data*) [clone .isra.0]        +139  [NEW]
      +160%    +131 decrease_call_count                                                       +131  +160%
      +4.1%     +32 init_channel_elem                                                          +32  +4.1%
       +50%      +3 destroy_call_elem                                                           +3   +50%

  +0.0%    +432 TOTAL                                                                  +1.79Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_fullstack_unary_ping_pong.BM_UnaryPingPong_MinInProcessCHTTP2_NoOpMutator_NoOpMutator__1_0.counters.old: 1


[microbenchmarks] Performance differences noted:
Benchmark                                                                                cpu_time    real_time
---------------------------------------------------------------------------------------  ----------  -----------
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/32768/2/1  +5%         +5%

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                              FILE SIZE
 ++++++++++++++ GROWING                                                                ++++++++++++++
  +0.0%     +16 [None]                                                                 +1.17Ki  +0.0%
   +15%    +400 src/core/ext/filters/max_age/max_age_filter.cc                            +400   +15%
      [NEW]    +397 max_idle_timer_cb                                                         +397  [NEW]
      [NEW]    +139 increase_call_count(grpc_exec_ctx*, channel_data*) [clone .isra.0]        +139  [NEW]
      +160%    +131 decrease_call_count                                                       +131  +160%
      +4.1%     +32 init_channel_elem                                                          +32  +4.1%
       +50%      +3 destroy_call_elem                                                           +3   +50%

  +0.0%    +416 TOTAL                                                                  +1.56Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                                              FILE SIZE
 ++++++++++++++ GROWING                                                                ++++++++++++++
  +0.0%     +16 [None]                                                                 +1.20Ki  +0.0%
   +15%    +400 src/core/ext/filters/max_age/max_age_filter.cc                            +400   +15%
      [NEW]    +397 max_idle_timer_cb                                                         +397  [NEW]
      [NEW]    +139 increase_call_count(grpc_exec_ctx*, channel_data*) [clone .isra.0]        +139  [NEW]
      +160%    +131 decrease_call_count                                                       +131  +160%
      +4.1%     +32 init_channel_elem                                                          +32  +4.1%
       +50%      +3 destroy_call_elem                                                           +3   +50%

  +0.0%    +416 TOTAL                                                                  +1.59Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@y-zeng
Copy link
Copy Markdown
Contributor Author

y-zeng commented Dec 11, 2017

Ping 😅

@y-zeng
Copy link
Copy Markdown
Contributor Author

y-zeng commented Jan 22, 2018

cc @mehrdada

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                      FILE SIZE
 ++++++++++++++ GROWING                                        ++++++++++++++
   +17%    +448 src/core/ext/filters/max_age/max_age_filter.cc    +448   +17%
      [NEW]    +394 max_idle_timer_cb                                 +394  [NEW]
      +195%    +160 decrease_call_count                               +160  +195%
      +286%    +103 increase_call_count                               +103  +286%
      +2.1%     +16 init_channel_elem                                  +16  +2.1%

 -+-+-+-+-+-+-+ MIXED                                          +-+-+-+-+-+-+-
  -0.0%      -8 [None]                                         +1.26Ki  +0.0%

  +0.0%    +440 TOTAL                                          +1.70Ki  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

&chand->max_idle_timer,
grpc_core::ExecCtx::Get()->Now() + chand->max_connection_idle,
&chand->close_max_idle_channel);
gpr_atm_no_barrier_store(&chand->last_enter_idle_time_millis,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be set when you actually change the idle_state to MAX_IDLE_STATE_TIMER_SET ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the review!

The MAX_IDLE_STATE_TIMER_SET branch is using grpc_core::ExecCtx::Get()->Now() inside grpc_timer_init() as last_enter_idle_time_millis. This stored last_enter_idle_time_millis is prepared for the MAX_IDLE_STATE_SEEN_EXIT_IDLE branch.

By setting it beforehand and using the release semantic, we can make sure if a thread sees MAX_IDLE_STATE_SEEN_ENTER_IDLE, it must see the correct last_enter_idle_time_millis.

GRPC_LOG_IF_ERROR("close_max_idle_channel", error);
while (true) {
gpr_atm idle_state = gpr_atm_acq_load(&chand->idle_state);
if (idle_state == MAX_IDLE_STATE_TIMER_SET) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might help to rewrite this as a case statement to be consistent with the other two places..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the change in #14122

grpc_connectivity_state connectivity_state;
/* Number of active calls */
gpr_atm call_count;
/* 'idle_state' holds the states of max_idle_timer and channel idleness.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comment here. If you don't mind, it would help if you write a brief description of what each state means..
especially the MAX_IDLE_STATE_SEEN_EXIT_IDLE and MAX_IDLE_STATE_SEEN_ENTER_IDLE .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the description in #14122

y-zeng added a commit that referenced this pull request Jan 24, 2018
@mehrdada
Copy link
Copy Markdown
Contributor

Anything blocking this from being merged? I'm asking since the backport already on 1.9.x and will come to master on the next upmerge and will cause a conflict if there are additional changes on this one, so clicking the merge button before the upmerge might be a good idea?

@abaldeva
Copy link
Copy Markdown

abaldeva commented Feb 6, 2018

y-zeng,

I tested the fix with grpc 1.9.0 upgrade. Unfortunately, enabling the option now causes a massive memory leak instead of crashing which is also bad. I am bombarding my server with an rpc and here is a log on Windows (I am able to reproduce the leak on Linux as well).

Would you like a new issue created for it or should this really be addressed as part of this issue?
memory_leak.txt

Thanks.

@y-zeng
Copy link
Copy Markdown
Contributor Author

y-zeng commented Feb 6, 2018

@mehrdada There are a few differences between this PR and the one merged in 1.9.x. Maybe we can just keep the changes in the up-merge and close this one?

@y-zeng
Copy link
Copy Markdown
Contributor Author

y-zeng commented Feb 6, 2018

@abaldeva Thanks for reporting the memory leak issue! I'll try reproducing it on 1.9.0 and see if we need to create a new issue to track it.

@abaldeva
Copy link
Copy Markdown

@y-zeng , did the log file I added help you in seeing the source of the leak?

Thanks.

@mehrdada
Copy link
Copy Markdown
Contributor

mehrdada commented Feb 13, 2018

@y-zeng totally missed your comment here. I have upmerged changes to master last week or so. Please take a quick look at current master and feel free to close this if it is satisfactory.

@mehrdada mehrdada closed this Feb 16, 2018
@abaldeva
Copy link
Copy Markdown

Created a new issue for the memory leak here - #14447

@abaldeva abaldeva mentioned this pull request May 16, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jan 20, 2019
@lock lock bot unassigned sreecha Jan 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants