Skip to content

Improve the call size estimate#15402

Merged
AspirinSJL merged 1 commit intogrpc:masterfrom
AspirinSJL:big_zone_for_call
May 18, 2018
Merged

Improve the call size estimate#15402
AspirinSJL merged 1 commit intogrpc:masterfrom
AspirinSJL:big_zone_for_call

Conversation

@AspirinSJL
Copy link
Copy Markdown
Contributor

When the first call creates its arena, it shouldn't just estimate the necessary size using the call stack. It should at least include the grpc_call and some other necessary stuff for a complete call. Otherwise, the arena will be too small for the first allocation for both the grpc_call and the call stack. And the initial zone of the arena will be wasted and the second zone will grow to twice the size of grpc_call and the call stack.

This means a lot for the channels that only have one call per channel.

The test of the internal balancer shows that this PR can reduce the memory usage per channel (one bidi steaming call per channel) from ~100KB to ~75KB.

@AspirinSJL AspirinSJL requested review from markdroth and yang-g May 16, 2018 04:58
@AspirinSJL AspirinSJL self-assigned this May 16, 2018
@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                               FILE SIZE
 ++++++++++++++ GROWING                                                 ++++++++++++++
  +0.0%    +144 [None]                                                     +264  +0.0%
  +0.1%     +16 src/core/lib/surface/call.cc                                +16  +0.1%
      +2.7%     +10 [Unmapped]                                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_call_size_estimate_besides_call_stack          +6  [NEW]

  +0.0%    +160 TOTAL                                                      +280  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

/* Get the estimated memory size for a call besides the call stack. Combined
* with the size of the call stack, it helps estimate the arena size for the
* first call. */
size_t grpc_call_get_call_size_estimate_besides_call_stack();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only helping the estimate for the first call, I think we should add _initial_ in the name.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest just calling this something like grpc_call_get_initial_size_estimate(). We can document the fact that it does not include the call stack.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the balancer folks were concerned about per-channel overhead, not per-call overhead -- in other words, the amount of overhead for a channel that is connected but does not have any calls started. This PR addresses per-call overhead, not per-channel overhead. Was I misunderstanding their concern, or is this PR addressing something else?

To be clear, I think you're right that there is a problem here, and this PR (with some concerns addressed) does seem like a reasonable approach. But I think we should also be looking at the per-channel overhead to see if there's something we can reduce there.

Please let me know if you have any questions about any of this.

#define MAX_SEND_EXTRA_METADATA_COUNT 3

// These estimates are used to create arena for the first call.
#define ESTIMATED_BATCH_CONTROL_COUNT 5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this number come from? Do we have any data on the average number of batches per call? If the goal here is to ensure that we always have enough space for all batches, then shouldn't we just use MAX_CONCURRENT_BATCHES instead?

More generally, I am concerned that we may be optimizing for the wrong case here. While this change may benefit the specific use case of a service that has one streaming call per channel, unary RPCs are far more common than streaming, and unary RPCs typically have only 1-2 batches. Will the hysteresis code ensure that we don't allocate more memory than we need after a few RPCs in the unary case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought we allocate a new batch_control object for each batch. We actually re-use it. Then we should definitely use MAX_CONCURRENT_BATCHES here.

Yes, the estimated arena size will be updated by previous calls. So subsequent calls will have more accurate size estimated. Unless each channel only has one unary call, we shouldn't be wasting a lot. If there is only one unary call per channel, we are wasting 368B * 4 though.


// These estimates are used to create arena for the first call.
#define ESTIMATED_BATCH_CONTROL_COUNT 5
#define ESTIMATED_MDELEM_COUNT 10
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here: Where does this number come from? Do we have any data on the average number of metadata elements we need to allocate per call?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the log, I see 11 mdelem allocated. I learned that the number of metadata (without any user-defined ones) is stable, so I think a rough estimate by this sample is fine.

It might be safer to use a slightly larger number like 16, especially when each mdelem is just 32B.

/* Get the estimated memory size for a call besides the call stack. Combined
* with the size of the call stack, it helps estimate the arena size for the
* first call. */
size_t grpc_call_get_call_size_estimate_besides_call_stack();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest just calling this something like grpc_call_get_initial_size_estimate(). We can document the fact that it does not include the call stack.

@AspirinSJL
Copy link
Copy Markdown
Contributor Author

AspirinSJL commented May 16, 2018

@markdroth From today's meeting, I believe the use scenario of the LB team is one channel plus one call. This is also how they conduct the load tests.

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +208 [None]                                     +264  +0.0%
      +0.0%    +176 [Unmapped]                                 +264  +0.0%
      +2.9%     +32 [None]                                        0  [ = ]
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +224 TOTAL                                      +280  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@yang-g
Copy link
Copy Markdown
Contributor

yang-g commented May 16, 2018

sanity, you may need clang-format

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +208 [None]                                     +264  +0.0%
      +0.0%    +176 [Unmapped]                                 +264  +0.0%
      +2.9%     +32 [None]                                        0  [ = ]
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +224 TOTAL                                      +280  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration    cpu_time    real_time
-------------------------------------------------------------------------------------------  -----------------------  ----------  -----------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/262144/2/1                                 -6%         -6%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +208 [None]                                     +256  +0.0%
      +0.0%    +176 [Unmapped]                                 +256  +0.0%
      +2.9%     +32 [None]                                        0  [ = ]
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +224 TOTAL                                      +272  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

1 similar comment
@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@AspirinSJL AspirinSJL force-pushed the big_zone_for_call branch from 71b4ae3 to 25c57a3 Compare May 17, 2018 16:44
@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +208 [None]                                     +256  +0.0%
      +0.0%    +176 [Unmapped]                                 +256  +0.0%
      +2.9%     +32 [None]                                        0  [ = ]
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +224 TOTAL                                      +272  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                               FILE SIZE
 ++++++++++++++ GROWING                                 ++++++++++++++
  +0.0%    +112 [None]                                     +232  +0.0%
  +0.1%     +16 src/core/lib/surface/call.cc                +16  +0.1%
      +2.7%     +10 [Unmapped]                                  +10  +2.7%
      [NEW]      +6 grpc_call_get_initial_size_estimate          +6  [NEW]

  +0.0%    +128 TOTAL                                      +248  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] Performance differences noted:
Benchmark                                                                                    atm_cas_per_iteration
-------------------------------------------------------------------------------------------  -----------------------
BM_StreamingPingPongWithCoalescingApi<InProcess, NoOpMutator, NoOpMutator>/134217728/1/1     -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/134217728/1/1  -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/134217728/134217728                    -4%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/134217728/134217728                 -4%

@AspirinSJL AspirinSJL merged commit f610029 into grpc:master May 18, 2018
@AspirinSJL AspirinSJL deleted the big_zone_for_call branch May 18, 2018 21:03
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants