Improve the call size estimate by AspirinSJL · Pull Request #15402 · grpc/grpc

AspirinSJL · 2018-05-16T04:57:07Z

When the first call creates its arena, it shouldn't just estimate the necessary size using the call stack. It should at least include the grpc_call and some other necessary stuff for a complete call. Otherwise, the arena will be too small for the first allocation for both the grpc_call and the call stack. And the initial zone of the arena will be wasted and the second zone will grow to twice the size of grpc_call and the call stack.

This means a lot for the channels that only have one call per channel.

The test of the internal balancer shows that this PR can reduce the memory usage per channel (one bidi steaming call per channel) from ~100KB to ~75KB.

grpc-testing · 2018-05-16T05:00:29Z

****************************************************************

libgrpc.so

     VM SIZE                                                               FILE SIZE
 ++++++++++++++ GROWING                                                 ++++++++++++++
  +0.0%    +144 [None]                                                     +264  +0.0%
  +0.1%     +16 src/core/lib/surface/call.cc                                +16  +0.1%
      +2.7%     +10 [Unmapped]                                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_call_size_estimate_besides_call_stack          +6  [NEW]

  +0.0%    +160 TOTAL                                                      +280  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

grpc-testing · 2018-05-16T05:09:08Z

[trickle] No significant performance differences

yang-g · 2018-05-16T05:23:42Z

src/core/lib/surface/call.h

+/* Get the estimated memory size for a call besides the call stack. Combined
+ * with the size of the call stack, it helps estimate the arena size for the
+ * first call. */
+size_t grpc_call_get_call_size_estimate_besides_call_stack();


Since this is only helping the estimate for the first call, I think we should add _initial_ in the name.

I'd suggest just calling this something like grpc_call_get_initial_size_estimate(). We can document the fact that it does not include the call stack.

grpc-testing · 2018-05-16T05:51:54Z

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%

markdroth

I thought that the balancer folks were concerned about per-channel overhead, not per-call overhead -- in other words, the amount of overhead for a channel that is connected but does not have any calls started. This PR addresses per-call overhead, not per-channel overhead. Was I misunderstanding their concern, or is this PR addressing something else?

To be clear, I think you're right that there is a problem here, and this PR (with some concerns addressed) does seem like a reasonable approach. But I think we should also be looking at the per-channel overhead to see if there's something we can reduce there.

Please let me know if you have any questions about any of this.

markdroth · 2018-05-16T14:36:34Z

src/core/lib/surface/call.cc

 #define MAX_SEND_EXTRA_METADATA_COUNT 3

+// These estimates are used to create arena for the first call.
+#define ESTIMATED_BATCH_CONTROL_COUNT 5


Where does this number come from? Do we have any data on the average number of batches per call? If the goal here is to ensure that we always have enough space for all batches, then shouldn't we just use MAX_CONCURRENT_BATCHES instead?

More generally, I am concerned that we may be optimizing for the wrong case here. While this change may benefit the specific use case of a service that has one streaming call per channel, unary RPCs are far more common than streaming, and unary RPCs typically have only 1-2 batches. Will the hysteresis code ensure that we don't allocate more memory than we need after a few RPCs in the unary case?

Oh, I thought we allocate a new batch_control object for each batch. We actually re-use it. Then we should definitely use MAX_CONCURRENT_BATCHES here.

Yes, the estimated arena size will be updated by previous calls. So subsequent calls will have more accurate size estimated. Unless each channel only has one unary call, we shouldn't be wasting a lot. If there is only one unary call per channel, we are wasting 368B * 4 though.

markdroth · 2018-05-16T14:37:16Z

src/core/lib/surface/call.cc


+// These estimates are used to create arena for the first call.
+#define ESTIMATED_BATCH_CONTROL_COUNT 5
+#define ESTIMATED_MDELEM_COUNT 10


Similar question here: Where does this number come from? Do we have any data on the average number of metadata elements we need to allocate per call?

From the log, I see 11 mdelem allocated. I learned that the number of metadata (without any user-defined ones) is stable, so I think a rough estimate by this sample is fine.

It might be safer to use a slightly larger number like 16, especially when each mdelem is just 32B.

markdroth · 2018-05-16T14:38:19Z

src/core/lib/surface/call.h

+/* Get the estimated memory size for a call besides the call stack. Combined
+ * with the size of the call stack, it helps estimate the arena size for the
+ * first call. */
+size_t grpc_call_get_call_size_estimate_besides_call_stack();


I'd suggest just calling this something like grpc_call_get_initial_size_estimate(). We can document the fact that it does not include the call stack.

AspirinSJL · 2018-05-16T21:28:29Z

@markdroth From today's meeting, I believe the use scenario of the LB team is one channel plus one call. This is also how they conduct the load tests.

grpc-testing · 2018-05-16T21:35:39Z

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +208 [None]                                     +264  +0.0%
      +0.0%    +176 [Unmapped]                                 +264  +0.0%
      +2.9%     +32 [None]                                        0  [ = ]
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +224 TOTAL                                      +280  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

grpc-testing · 2018-05-16T21:44:02Z

[trickle] No significant performance differences

yang-g · 2018-05-16T21:47:26Z

sanity, you may need clang-format

grpc-testing · 2018-05-16T22:10:11Z

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +208 [None]                                     +264  +0.0%
      +0.0%    +176 [Unmapped]                                 +264  +0.0%
      +2.9%     +32 [None]                                        0  [ = ]
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +224 TOTAL                                      +280  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

grpc-testing · 2018-05-16T22:19:04Z

[trickle] No significant performance differences

grpc-testing · 2018-05-16T22:26:51Z

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration    cpu_time    real_time
-------------------------------------------------------------------------------------------  -----------------------  ----------  -----------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/262144/2/1                                 -6%         -6%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

grpc-testing · 2018-05-16T22:50:42Z

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +208 [None]                                     +256  +0.0%
      +0.0%    +176 [Unmapped]                                 +256  +0.0%
      +2.9%     +32 [None]                                        0  [ = ]
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +224 TOTAL                                      +272  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

grpc-testing · 2018-05-16T22:57:28Z

[trickle] No significant performance differences

grpc-testing · 2018-05-16T23:01:52Z

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

grpc-testing · 2018-05-16T23:42:11Z

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

markdroth

Nice work!

grpc-testing · 2018-05-17T16:48:06Z

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +208 [None]                                     +256  +0.0%
      +0.0%    +176 [Unmapped]                                 +256  +0.0%
      +2.9%     +32 [None]                                        0  [ = ]
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +224 TOTAL                                      +272  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

grpc-testing · 2018-05-17T16:56:04Z

[trickle] No significant performance differences

grpc-testing · 2018-05-17T17:40:10Z

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

grpc-testing · 2018-05-18T16:51:46Z

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +112 [None]                                     +232  +0.0%
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +128 TOTAL                                      +248  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

grpc-testing · 2018-05-18T16:59:24Z

[trickle] No significant performance differences

grpc-testing · 2018-05-18T17:43:48Z

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

AspirinSJL · 2018-05-18T21:02:50Z

Basic Tests MacOS [dbg] (internal CI): FAILURE: macos bins/dbg/client_lb_end2end_test --gtest_filter=ClientLbEnd2endTest.PickFirstBackOffInitialReconnect GRPC_POLL_STRATEGY=poll #15454
Basic Tests MacOS [opt] (internal CI): FAILURE: macos bins/dbg/client_lb_end2end_test --gtest_filter=ClientLbEnd2endTest.PickFirstBackOffInitialReconnect GRPC_POLL_STRATEGY=poll #15454
Basic Tests Multi-language Linux (internal CI): ERROR: Connection reset by peer while running build_python.sh #15455
Bazel Debug build for C/C++: Bazel test flake: "No address added out of total 1 resolved" from different tests under TSAN and dbg #14512, Bazel test flake: " assertion failed: error == GRPC_ERROR_NONE" #15427
Bazel Opt build for C/C++: Bazel test flake: "No address added out of total 1 resolved" from different tests under TSAN and dbg #14512
Interop Cloud-to-Prod Tests (internal CI): cloud to prod: Failed to build interop image build_docker_java #15456
Msan C (internal CI): bins/msan/server_fuzzer_one_entry: server fuzzer assertion #15458
Tsan C (internal CI): data race: h2_compress_nosec_test retry_streaming_succeeds_before_replay_finished GRPC_POLL_STRATEGY=epoll1 #15457

AspirinSJL requested review from markdroth and yang-g May 16, 2018 04:58

AspirinSJL self-assigned this May 16, 2018

yang-g reviewed May 16, 2018

View reviewed changes

markdroth reviewed May 16, 2018

View reviewed changes

markdroth approved these changes May 17, 2018

View reviewed changes

yang-g approved these changes May 17, 2018

View reviewed changes

Include other usage to estimate call size

25c57a3

AspirinSJL force-pushed the big_zone_for_call branch from 71b4ae3 to 25c57a3 Compare May 17, 2018 16:44

AspirinSJL added the kokoro:force-run label May 18, 2018

kokoro-team removed the kokoro:force-run label May 18, 2018

AspirinSJL merged commit f610029 into grpc:master May 18, 2018

AspirinSJL deleted the big_zone_for_call branch May 18, 2018 21:03

jtattermusch mentioned this pull request May 23, 2018

data race: h2_compress_nosec_test retry_streaming_succeeds_before_replay_finished GRPC_POLL_STRATEGY=epoll1 #15457

Closed

AspirinSJL mentioned this pull request May 23, 2018

Revert arena size fix #15524

Merged

markdroth mentioned this pull request May 24, 2018

Hold ref to call stack while replay batches are pending. #15536

Merged

lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019

Conversation

AspirinSJL commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

markdroth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AspirinSJL commented May 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

yang-g commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

grpc-testing commented May 16, 2018

Uh oh!

markdroth left a comment

Choose a reason for hiding this comment

Uh oh!

grpc-testing commented May 17, 2018

Uh oh!

grpc-testing commented May 17, 2018

Uh oh!

grpc-testing commented May 17, 2018

Uh oh!

grpc-testing commented May 18, 2018

Uh oh!

grpc-testing commented May 18, 2018

Uh oh!

grpc-testing commented May 18, 2018

Uh oh!

AspirinSJL commented May 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AspirinSJL commented May 16, 2018 •

edited

Loading