Remove memset(0) from arena allocate memory.#16944
Conversation
|
|
|
BTW, the sanity check failures are due to calling |
|
|
markdroth
left a comment
There was a problem hiding this comment.
I've reviewed everything except the chttp2 changes; I assume @ncteisen will do that.
Thanks for doing this!
Reviewed 22 of 22 files at r1.
Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @soheilhy, @yang-g, @AspirinSJL, @yashykt, @ncteisen, @vjpai, and @apolcyn)
src/core/ext/filters/client_channel/client_channel.cc, line 1590 at r1 (raw file):
batch_data->batch.recv_trailing_metadata = false; batch_data->batch.cancel_stream = false;
Nit: Please remove unnecessary blank line.
src/core/ext/filters/client_channel/client_channel.cc, line 3167 at r1 (raw file):
calld->owning_call = args->call_stack; calld->call_combiner = args->call_combiner; memset(&calld->deadline_state, 0, sizeof(calld->deadline_state));
I suggest moving this into grpc_deadline_state_init().
src/core/ext/filters/client_channel/client_channel.cc, line 3172 at r1 (raw file):
calld->deadline); }
Nit: Please remove unnecessary blank line.
src/core/ext/filters/client_channel/client_channel.cc, line 3173 at r1 (raw file):
} calld->subchannel_call = nullptr;
General comment, applies to most of the changes in this PR:
Please initialize these fields in the same order in which they are defined in the struct, so that it's easy to verify that we haven't missed anything. (If the order here is performance sensitive, this seems rather brittle, since we have nothing guaranteeing that we won't change the order later.)
Alternatively... Would it make sense to set the initial values for these fields in the spot where they are defined, and then change this code to do a C++ placement-new to initialize it? That way, this could all be done in one place, and there would be no need for special handling of the C++ members (such as the RefCountedPtr<> members). That should also allow us to eliminate the use of ManualConstructor<> for the send_messages field. Note that this would require manually invoking the dtor from cc_destroy_call_elem().
It seems like the latter approach would work well in a number of places (all the filters and for the grpc_call struct, and the grpc_transport_stream_op_batch struct).
src/core/ext/filters/client_channel/client_channel.cc, line 3187 at r1 (raw file):
calld->peer_string = nullptr; calld->subchannel_call = nullptr; calld->cancel_error = nullptr;
This should be GRPC_ERROR_NONE.
src/core/ext/filters/client_channel/lb_policy/grpclb/client_load_reporting_filter.cc, line 76 at r1 (raw file):
// Get stats object from context and take a ref. GPR_ASSERT(args->context != nullptr); memset(&calld->client_stats, 0, sizeof(calld->client_stats));
This is another case where C++ placement-new might be helpful.
src/core/ext/filters/deadline/deadline_filter.cc, line 202 at r1 (raw file):
struct start_timer_after_init_state* state = static_cast<struct start_timer_after_init_state*>( gpr_malloc(sizeof(*state)));
Suggest using grpc_core::New<> here (and grpc_core::Delete<> in start_timer_after_init()), so that the field below can be initialized in-line.
src/core/ext/filters/http/client/http_client_filter.cc, line 448 at r1 (raw file):
calld->call_combiner = args->call_combiner; calld->recv_initial_metadata = nullptr; calld->recv_initial_metadata_error = nullptr;
GRPC_ERROR_NONE
src/core/ext/filters/http/client/http_client_filter.cc, line 456 at r1 (raw file):
calld->original_send_message_on_complete = nullptr; calld->recv_initial_metadata = nullptr; calld->recv_initial_metadata_error = nullptr;
GRPC_ERROR_NONE
src/core/ext/filters/http/server/http_server_filter.cc, line 438 at r1 (raw file):
calld->have_read_stream = false; calld->recv_initial_metadata_flags = nullptr; calld->recv_initial_metadata_ready_error = nullptr;
GRPC_ERROR_NONE
src/core/ext/filters/http/server/http_server_filter.cc, line 445 at r1 (raw file):
calld->recv_message = nullptr; calld->original_recv_trailing_metadata_ready = nullptr; calld->recv_initial_metadata_ready_error = nullptr;
GRPC_ERROR_NONE
src/core/ext/filters/message_size/message_size_filter.cc, line 238 at r1 (raw file):
calld->original_recv_trailing_metadata_ready = nullptr; calld->seen_recv_trailing_metadata = false; calld->recv_trailing_metadata_error = nullptr;
GRPC_ERROR_NONE
src/core/lib/gpr/arena.cc, line 93 at r1 (raw file):
}; static void* malloc_aligned(size_t size) {
There's not really any need for this function anymore; I suggest just changing the callers to directly call gpr_malloc_aligned().
src/core/lib/security/transport/server_auth_filter.cc, line 254 at r1 (raw file):
calld->original_recv_initial_metadata_ready = nullptr; calld->original_recv_trailing_metadata_ready = nullptr; calld->recv_initial_metadata_error = nullptr;
GRPC_ERROR_NONE
src/core/lib/security/transport/server_auth_filter.cc, line 255 at r1 (raw file):
calld->original_recv_trailing_metadata_ready = nullptr; calld->recv_initial_metadata_error = nullptr; calld->recv_trailing_metadata_error = nullptr;
GRPC_ERROR_NONE
src/core/lib/surface/call.cc, line 297 at r1 (raw file):
GPR_TIMER_SCOPE("grpc_call_create", 0); GRPC_CHANNEL_INTERNAL_REF(args->channel, "call");
Why move this to the top of this function? Seems unrelated to the rest of this PR.
src/core/lib/surface/call.cc, line 310 at r1 (raw file):
gpr_arena_alloc(arena, GPR_ROUND_UP_TO_ALIGNMENT_SIZE(sizeof(grpc_call)) + channel_stack->call_stack_size); call = reinterpret_cast<grpc_call*>(mem);
Any reason for doing this in a separate statement from the allocation?
src/core/lib/surface/call.cc, line 314 at r1 (raw file):
call->arena = arena; call->cq = args->cq; *out_call = call;
How about moving this either up to line 311, or down to the end of this function, right before we return? That way, it's not sitting right in the middle of the fields we're initializing.
src/core/lib/surface/call.cc, line 1162 at r1 (raw file):
bctl->op.recv_trailing_metadata = false; bctl->op.cancel_stream = false; bctl->op.handler_private.extra_arg = nullptr;
I don't think this needs to be initialized here -- it will be set before it ever gets used.
soheilhy
left a comment
There was a problem hiding this comment.
Thank you @markdroth for the quick and great review!
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @markdroth, @soheilhy, @yang-g, @AspirinSJL, @yashykt, @ncteisen, @vjpai, and @apolcyn)
src/core/ext/filters/client_channel/client_channel.cc, line 1590 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Nit: Please remove unnecessary blank line.
Sure, done.
src/core/ext/filters/client_channel/client_channel.cc, line 3167 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I suggest moving this into
grpc_deadline_state_init().
Actually your comment reminded me that, I fixed the underlying bug that would require this memset(). This is now removed. :-)
src/core/ext/filters/client_channel/client_channel.cc, line 3172 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Nit: Please remove unnecessary blank line.
Sure, done.
src/core/ext/filters/client_channel/client_channel.cc, line 3173 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
General comment, applies to most of the changes in this PR:
Please initialize these fields in the same order in which they are defined in the struct, so that it's easy to verify that we haven't missed anything. (If the order here is performance sensitive, this seems rather brittle, since we have nothing guaranteeing that we won't change the order later.)
Alternatively... Would it make sense to set the initial values for these fields in the spot where they are defined, and then change this code to do a C++ placement-new to initialize it? That way, this could all be done in one place, and there would be no need for special handling of the C++ members (such as the
RefCountedPtr<>members). That should also allow us to eliminate the use ofManualConstructor<>for thesend_messagesfield. Note that this would require manually invoking the dtor fromcc_destroy_call_elem().It seems like the latter approach would work well in a number of places (all the filters and for the
grpc_callstruct, and thegrpc_transport_stream_op_batchstruct).
Certainly, I will reorder the fields in the next revision while waiting for Noah's review on HTTP2 stuff. I've reordered some but it's not complete yet.
As for C++, I agree that it would the right approach, but the problem is that at the moment we lack proper ctors. When we use placement new, we will incur a memset(0), which we really don't want to initialize (or we will initialize later). When I tried that approach in Sep, I observe 5-10% regression in the microbenchmarks and application benchmarks.
I think we should definitely do that (I will make sure it happens), but perhaps later as a follow up to this patch. We can do that on a per class basis: add proper ctor, and then move call data init to placement new.
P.S. For grpc_call, specifically, it will be a bit tricky due to cache misses. There are too many blobs of data that we really don't want to initialize before filter initialization.
src/core/ext/filters/client_channel/client_channel.cc, line 3187 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should be
GRPC_ERROR_NONE.
Good catch. Done.
src/core/ext/filters/deadline/deadline_filter.cc, line 202 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest using
grpc_core::New<>here (andgrpc_core::Delete<>instart_timer_after_init()), so that the field below can be initialized in-line.
I wasn't aware of those methods. Thank you for the pointers. Please see my comment about C++'ifying the code above. While I agree that's the right approach for long term, here a simple initialization would come at a cost of initialization the closure structure twice (one in new and one in GRPC_CLOSURE_INIT). Would you mind please if we do this later?
src/core/ext/filters/http/client/http_client_filter.cc, line 448 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
GRPC_ERROR_NONE
Done.
src/core/ext/filters/http/client/http_client_filter.cc, line 456 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
GRPC_ERROR_NONE
Done.
src/core/ext/filters/http/server/http_server_filter.cc, line 438 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
GRPC_ERROR_NONE
Done.
src/core/ext/filters/http/server/http_server_filter.cc, line 445 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
GRPC_ERROR_NONE
Done.
src/core/ext/filters/message_size/message_size_filter.cc, line 238 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
GRPC_ERROR_NONE
Done.
src/core/lib/gpr/arena.cc, line 93 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
There's not really any need for this function anymore; I suggest just changing the callers to directly call
gpr_malloc_aligned().
Sure, done. I don't have a strong preference here, but thought it would create a larger diff. so I avoided that.
src/core/lib/security/transport/server_auth_filter.cc, line 254 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
GRPC_ERROR_NONE
Done.
src/core/lib/security/transport/server_auth_filter.cc, line 255 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
GRPC_ERROR_NONE
Done.
src/core/lib/surface/call.cc, line 297 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why move this to the top of this function? Seems unrelated to the rest of this PR.
Good question. So, the problem is that initialization of the whole call structure causes a lot of cache to be evicted. When we reach the line to ref the channel after initialization, we always had a cache-miss hence a regression with this patch. Previously, because memset(0) was before everything else, we would end up pulling the channel into cache before entering this function.
So, to avoid a single digit performance regression I moved it to top of function before everything else.
src/core/lib/surface/call.cc, line 310 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Any reason for doing this in a separate statement from the allocation?
Good catch. I used this during the (very painful :P) debugging of this patch. Removed.
src/core/lib/surface/call.cc, line 314 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
How about moving this either up to line 311, or down to the end of this function, right before we return? That way, it's not sitting right in the middle of the fields we're initializing.
Excellent point. Done.
src/core/lib/surface/call.cc, line 1162 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this needs to be initialized here -- it will be set before it ever gets used.
Oh, yes, thank you for catching this. I think I have fixed the underlying cause of this issue, but forgot to remove it.
soheilhy
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @markdroth, @soheilhy, @yang-g, @AspirinSJL, @yashykt, @ncteisen, @vjpai, and @apolcyn)
src/core/ext/filters/client_channel/lb_policy/grpclb/client_load_reporting_filter.cc, line 76 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This is another case where C++ placement-new might be helpful.
I missed to reply this one. Please see my comments above.
I can change this one to calld->client_stats = {}; if you would prefer that. Thank you.
|
|
|
|
soheilhy
left a comment
There was a problem hiding this comment.
Reviewable status: 10 of 22 files reviewed, 17 unresolved discussions (waiting on @markdroth, @soheilhy, @yang-g, @AspirinSJL, @yashykt, @ncteisen, @vjpai, and @apolcyn)
src/core/ext/filters/client_channel/client_channel.cc, line 3173 at r1 (raw file):
Previously, soheilhy (Soheil Hassas Yeganeh) wrote…
Certainly, I will reorder the fields in the next revision while waiting for Noah's review on HTTP2 stuff. I've reordered some but it's not complete yet.
As for C++, I agree that it would the right approach, but the problem is that at the moment we lack proper ctors. When we use placement new, we will incur a memset(0), which we really don't want to initialize (or we will initialize later). When I tried that approach in Sep, I observe 5-10% regression in the microbenchmarks and application benchmarks.
I think we should definitely do that (I will make sure it happens), but perhaps later as a follow up to this patch. We can do that on a per class basis: add proper ctor, and then move call data init to placement new.
P.S. For
grpc_call, specifically, it will be a bit tricky due to cache misses. There are too many blobs of data that we really don't want to initialize before filter initialization.
I made a pass through all initialization and reorder them to match the definition order of fields. As I mentioned grpc_call is tricky and I would like to send a follow up patch to reorder the fields to make them more cache friendly.
P.S. there are a few additional changes from error = nullptr to error = GRPC_ERROR_NONE in the original code, and a few initializations of booleans using false/true instead of previously 0/1.
PTAL. Thank you
|
|
|
|
| const grpc_call_element_args* args) { | ||
| call_data* calld = static_cast<call_data*>(elem->call_data); | ||
| calld->call_combiner = args->call_combiner; | ||
| calld->recv_initial_metadata = nullptr; |
There was a problem hiding this comment.
I don't know if it makes a perf difference, but a lot of these fields will never be read until after they are set in hc_start_transport_stream_op_batch, so even these initializations might not be needed
There was a problem hiding this comment.
Thank you @ncteisen for the review. I had removed the ones that show up in profiles (most notably closures and mdelem fields) but, certainly, let me take another stab at it and remove more of these unnecessary initialization. I will hopefully upload another commit by tomorrow morning to remove the remaining unnecessary ones.
There was a problem hiding this comment.
It is also something we could do in a separate PR, to make the overall changes smaller in size
There was a problem hiding this comment.
Yes, great point on the size and safety. Please allow me to give it another shot this afternoon, and I will post the results here.
| /* one ref is for destroy */ | ||
| gpr_ref_init(&t->refs, 1); | ||
| t->combiner = grpc_combiner_create(); | ||
| t->ep = ep; |
There was a problem hiding this comment.
Could leave a comment somewhere in this functions that these initializations should occur in the same order as the fields are in the struct itself?
| const grpc_call_element_args* args) { | ||
| call_data* calld = static_cast<call_data*>(elem->call_data); | ||
| calld->call_combiner = args->call_combiner; | ||
| calld->recv_initial_metadata = nullptr; |
There was a problem hiding this comment.
Some of these fields are never read until they are set in hc_start_transport_stream_op_batch, is it worth considering perf impact of leaving them uninitialized till then?
Same comment for all the cases where we can gaurantee a field will be set before it's read
|
Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement. After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.
Regards, |
soheilhy
left a comment
There was a problem hiding this comment.
Reviewable status: 3 of 28 files reviewed, 19 unresolved discussions (waiting on @markdroth, @soheilhy, @ncteisen, @yang-g, @AspirinSJL, @yashykt, @vjpai, and @apolcyn)
src/core/ext/filters/client_channel/client_channel.cc, line 3173 at r1 (raw file):
Previously, soheilhy (Soheil Hassas Yeganeh) wrote…
I made a pass through all initialization and reorder them to match the definition order of fields. As I mentioned
grpc_callis tricky and I would like to send a follow up patch to reorder the fields to make them more cache friendly.P.S. there are a few additional changes from
error = nullptrtoerror = GRPC_ERROR_NONEin the original code, and a few initializations of booleans using false/true instead of previously 0/1.PTAL. Thank you
As promised I added proper C++ ctor and dtor. PTAL :-)
Also, as mentioned grpc_call is a bit more than ctor/dtor but the rest have proper C++ initialization now.
soheilhy
left a comment
There was a problem hiding this comment.
Reviewable status: 3 of 28 files reviewed, 19 unresolved discussions (waiting on @markdroth, @ncteisen, @soheilhy, @yang-g, @AspirinSJL, @yashykt, @vjpai, and @apolcyn)
src/core/ext/transport/chttp2/transport/chttp2_transport.cc, line 488 at r3 (raw file):
Previously, ncteisen (Noah Eisen) wrote…
Could leave a comment somewhere in this functions that these initializations should occur in the same order as the fields are in the struct itself?
Noah I made this all C++. So, PTAL. It's a bit different (in a good way) now.
|
|
|
|
|
36810e4 to
16f9d61
Compare
There was a problem hiding this comment.
@ncteisen This one was caught by running all tests internally. PTAL.
|
|
|
|
16f9d61 to
7679177
Compare
|
|
|
|
Callers are updated to properly initialize the memory. This behavior can be overridden using GRPC_ARENA_INIT_STRATEGY environment variable.
7679177 to
48e4a81
Compare
|
|
|
|
|
The only test failure was a tool failure. So, I think we can ignore. |
Callers should properly initialize the memory.
Note that to avoid performance regressions we need some
reordering in the initialization of
grpc_call.This behavior can be overridden using
GRPC_ARENA_INIT_STRATEGYenvironment variable.
I had two more changes in the following files which I will skip in the
PR because they are purely performance changes for cache
coherency. I will upload that as a separate patch once this PR
is merged:
src/core/lib/channel/channel_stack.cc
src/core/lib/surface/call.cc
This change is