PHP benchmark: report histogram and unconstrained scenarios by ZhouyihaiDing · Pull Request #12760 · grpc/grpc

ZhouyihaiDing · 2017-09-28T22:06:31Z

Currently, each time when the php cliend do a call, it will upload latency result through a streamming RPC immediately, and that time is counted in the benchmark time.

Now, it sends a histogram stats to the ruby proxy for every 2000 ping-pong.
QPS tests in my own local machine (protobuf c extension, 20s warm-up, 60s benchmark):

QPS            before    after
unary          2865.9    4166.7
streamming     2898.0    4200

When unconstrained scenario wants to use multiple channels, ruby will create multiple php client and assign a server target to each of them.

vjpai · 2017-09-28T22:29:25Z

src/ruby/qps/proxy-worker.rb

Hi there,
It shouldn't be assumed that there are exactly 3 marks. It is possible that we would want more statistics reporting points than that, and all the benchmarks are written to assume that there can be an unlimited number of marks. That's why the mark has a reset field; to allow some marks to specify that the stats should be cleared. It's fine to react to marks differently from other operations, but I don't think you should assume a count of marks.

Thanks for the correction. Then I also can't assume that only the last mark is always for the benchmark.
If there is a middle "mark" used for the benchmark, uploading the data still consumes time for the next mark, which lower the qps. I didn't see where I can pause the drive, letting ruby finish uploading before start next "mark".
I guess I can continue the signal way to let php know when "mark" is called. But instead of uploading all latencies, I should implement histogram in the php side and upload it once before each mark.
Do you have some suggestions to let php knows when to upload other than using signal? The response from the server seems useless because it remains the same until it is closed.
n the benchmark framework I

Is that right that "reqs.each do |req|" works as
while (1){
in = stream.recv(); // the program will hang there for warm-up time, lots of benchmark time.
case in.isSetup:
case in.inMark: get data from php, upload histogram to the driver.
}
Thus in the in.inMark, streamming upload will always be expensive and it cost time from the next "mark". What I'm doing now is to make the split latency sent together, which may speed up cache or buffer re-use or something. But the best way is just uploading once, right?

I think the right thing is to restructure the proxy worker. Right now the PHP client talks to the proxy worker over a 1-sided client-side streaming RPC called ReportTime. Instead, it should be changed to a bidi-streaming RPC. On the bidi streaming RPC, the proxy worker should send each mark to the PHP client. The PHP client should read that and then respond with a histogram that it has been collecting over the execution. The proxy worker can then pass that histogram back to the driver. That will require maintaining a histogram in the PHP client and sending a ClientStats proto back to the proxy worker on a mark. Does that make sense? Maybe the Quit and other messages could also be passed from the proxy worker to the PHP client like that to take signal out of the mix.

That sounds cool by using a bidi-streaming. I'll try use async way or fork a child process listening to this bidi-streaming RPC in sync way. Thanks for your help!
I'll close this PR and reopen it until I finish them.

Oh yes, I forgot about that point, that neither of those options is really natural in PHP. I think that's why I went with the 1-directional streaming thing in the first place. Probably needs better investigation then.

Another possibility: go with the example that you're currently using (storing the latency results in an array and sending it out as a batch) but send it out periodically like once every 100 ms or something so that you get the batching effects that you're talking about but without having to alter the framework altogether. The important thing is just to send far fewer stats-reporting RPCs.

I'm still using signal to let them talk to each other. I tried create a child process and let a streaming RPC listening there. But it seems php-gRPC doesn't work on child process(the same code works in the parent, will hang in the child). I don't know it's php's problem or php-gRPC's problem yet.
Currently every mark will trigger php uploading, and continue until the data is uploaded.
I'll implement histogram in the php side next.

ZhouyihaiDing · 2017-10-02T01:43:15Z

I tried to create a child process in php and let it handle stats stream but it seems grpc has some problem inside child process. Thus I can only handle the signal inside main process.
Tick also has influence in the php client side. But setting time_alarm also needs tick. I'll check if sending histogram for every ping-pong is faster or not after I implement merge hist in ruby side.
Each signal will block ruby proxy because it should wait until php program is started or data is transferred.
Bucket data can't be send back to the ruby proxy. I am checking why.

ZhouyihaiDing · 2017-10-02T18:22:57Z

Result shows batch sending result for every 1000 times is faster than using signal as trigger.

ZhouyihaiDing · 2017-10-02T22:59:30Z

I changed it in the ruby side:

Each client-channel will create a thread which start a php-client (although using thread seems unnecessary, but start php-client is really slow, that most times first mark is called before php-client is initiate. But using signal to keep the order will introduce 'tick' in the php side, which slow the program);
php report histogram to the ruby by "report_hist" for every 2000 ping-pong. It's faster than using signal trigger uploading the histogram.
Can you please help me to have a review for the ruby code? @vjpai I changed proxy-server.rb and histogram.rb

stanley-cheung · 2017-10-02T23:07:47Z

src/php/tests/qps/client.php

space after if for consistency

stanley-cheung · 2017-10-02T23:08:58Z

src/php/tests/qps/histogram.php

add a space around the = for consistency

stanley-cheung · 2017-10-02T23:09:42Z

tools/run_tests/run_performance_tests.py

Is this intended?

Oh. I was using this to print the benchmark scenario from other languages to see how many channel/RPC per channel they are using.
I'll delete it.

grpc-testing · 2017-10-05T22:27:22Z

[microbenchmarks] No significant performance differences

grpc-testing · 2017-10-05T22:39:33Z

[microbenchmarks] No significant performance differences

grpc-testing · 2017-10-05T22:40:51Z

[trickle] No significant performance differences

grpc-testing · 2017-10-05T22:41:16Z

[trickle] No significant performance differences

grpc-testing · 2017-10-05T23:49:32Z

[microbenchmarks] No significant performance differences

grpc-testing · 2017-10-05T23:51:41Z

[microbenchmarks] No significant performance differences

ZhouyihaiDing · 2017-10-06T01:37:25Z

Asan C (internal CI) -
Basic Tests C++ Linux [opt] (internal CI) - (100 failure in last 24h)json_run_localhost:cpp_protobuf_async_client_unary_1channel_64wide_128Breq_8MBresp_secure
Basic Tests MacOS [opt] (internal CI - #12857
Basic Tests Windows [dbg] (internal CI) - #12858
Interop Cloud-to-Prod Tests (internal CI) - #10763
Tsan C (internal CI) - #12859

jtattermusch · 2017-10-06T08:54:56Z

tools/run_tests/performance/build_performance.sh

nit: I'd just enumerate "php7" and "php7_protobuf_c" here, no need to use wildcards.
Also no reason to pass $CONFIG as an argument - that variable will be available in build_performance_php.sh anyway - taking an env variable, passing it as a param only to make it an env variable seems unnecessary.

Done. I use wildcards in case we need add benchmark for both php and php7. But that requires switch language version in VM, thus only focus on build benchmark for php7 now.

jtattermusch · 2017-10-06T09:03:42Z

tools/run_tests/performance/build_performance.sh

Also, looks like you are now building php twice - once for php7 and once for php7_protobuf_c, given the build process seems to be the same regardless of using protobuf_c, maybe you can just build everything as language "php"?

Basically, that would mean differentiating the language name in scenario_config.py ( = str(language)) and "language to build" by introducing language.lang_to_build().

Building PHP twice is not a problem if:
1.) the second build takes negligible time (everything already built and installed)
2.) you can guarantee that the PHP builds won't run concurrently and therefore collide with each other.

so the builds are not running concurrently as they are executed one by one by build_perfomance.sh

One possible trick (and possibly the simplest):
if both php7 and php7_protobuf_c are passed to build_performance.sh,
it can remember that php has already been build when building the first build is triggered
(PHP_ALREADY_BUILT=the_lang_that_built_it), and the second request to build can be skipped ("Skipping PHP build as already built by $PHP_ALREADY_BUILT").

Done. Thank you for the suggestions. I check PHP_ALREADY_BUILT and make it build only once.

grpc-testing · 2017-10-06T14:59:15Z

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

jtattermusch

LGTM for the changes to performance testing scripts (.py and .sh). Thanks for addressing my comments, I think now things look much better!

After merging, please monitor the benchmark suite for a while to check that PHP builds are not introducing extra flakiness.

grpc-testing · 2017-10-06T15:34:24Z

[trickle] No significant performance differences

grpc-testing · 2017-10-06T16:39:08Z

[microbenchmarks] No significant performance differences

grpc-testing · 2017-10-06T21:05:34Z

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

grpc-testing · 2017-10-06T21:15:11Z

[trickle] No significant performance differences

vjpai

LGTM. I'm fine with merging, but I would like an opinion from @jtattermusch or @adelez : does it make sense to dashboard the PHP QPS test? The model of parallelism used here (separate process per client channel) is completely different from the other wrapped languages (which use 1 process and may have multiple threads). I know that this is since multithreading isn't idiomatic in PHP, but it might muddle the actual contents of the multilingual dashboard.

php7 build once

grpc-testing · 2017-10-06T22:33:12Z

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]

grpc-testing · 2017-10-06T22:45:21Z

[trickle] No significant performance differences

grpc-testing · 2017-10-06T22:50:54Z

Corrupt JSON data (indicates timeout or crash): 
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_MinInProcessCHTTP2_NoOpMutator_NoOpMutator__2M_1_0.counters.new: 1
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_InProcessCHTTP2_NoOpMutator_NoOpMutator__2M_1_0.opt.new: 1
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_InProcessCHTTP2_NoOpMutator_NoOpMutator__2M_2_0.counters.new: 1


[microbenchmarks] No significant performance differences

ZhouyihaiDing · 2017-10-06T23:03:28Z

Not related:
Basic Tests Windows [dbg] (internal CI) - #12872 cmake flow_control.cc
Basic Tests Windows [opt] (internal CI) - #12872
Artifact Build Windows (internal CI)
Portability Tests Windows [Build Only] (internal CI) - #12872
Interop Cloud-to-Cloud Tests (internal CI) - #12875 all node tests fail
Interop Cloud-to-Prod Tests (internal CI) - #12875
Portability Tests Linux [Build Only] (internal CI) - #12871
Basic Tests Multi-language Linux (internal CI) - #12886 - php
Basic Tests C++ Linux [dbg] (internal CI) - #12887
Basic Tests C++ Linux [opt] (internal CI) - #12887
Basic Tests MacOS [dbg] (internal CI) -#12889
Basic Tests MacOS [opt] (internal CI) - #12889
Msan C (internal CI) - #12892
Asan C (internal CI) - #12892

fengli79 · 2017-10-06T23:27:47Z

@vjpai @adelez @jtattermusch
I actually have the same question. What's the expected behavior in the multiple cores testing?
I'm under the impression that either client or server may be the bottleneck and occupied all the cores, however that sounds like a blur test scenario for QPS measuring.
For example, in a 8 cores client and 8 cores server test, some languages may get the bottleneck on client side while other may hit it on the server side. These numbers are not really comparable.
Ideally it should be:

Have different set of tests to limit the core on client and server side. Like give client 8 cores and give the server enough CPUs to serve all the traffic. And make a different test case to limit the server for 8 cores and give client enough CPUs.
On the machine which get limited CPU cores, adjust the concurrency and thread model to get max QPS. I'm expecting that the QPS under fixed number of cores is a curve when we adjust the concurrency factors (threads/processes, etc).

grpc-testing · 2017-10-07T00:16:40Z

Corrupt JSON data (indicates timeout or crash): 
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_MinInProcessCHTTP2_NoOpMutator_NoOpMutator__2M_1_0.opt.old: 3
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_MinInProcessCHTTP2_NoOpMutator_NoOpMutator__2M_1_0.opt.new: 1
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_InProcessCHTTP2_NoOpMutator_NoOpMutator__2M_1_0.counters.old: 1
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_MinInProcessCHTTP2_NoOpMutator_NoOpMutator__2M_1_0.counters.old: 1


[microbenchmarks] No significant performance differences

googlebot added the cla: yes label Sep 28, 2017

vjpai reviewed Sep 28, 2017

View reviewed changes

ZhouyihaiDing closed this Sep 29, 2017

ZhouyihaiDing reopened this Oct 2, 2017

ZhouyihaiDing force-pushed the opt_php_benchmark branch 2 times, most recently from ad2cdfb to 2d52819 Compare October 2, 2017 16:00

grpc deleted a comment from grpc-testing Oct 2, 2017

ZhouyihaiDing force-pushed the opt_php_benchmark branch 2 times, most recently from 41d4ac6 to b1365d4 Compare October 2, 2017 18:29

grpc deleted a comment from grpc-testing Oct 2, 2017

ZhouyihaiDing force-pushed the opt_php_benchmark branch from 656f19e to 18310cb Compare October 2, 2017 22:51

ZhouyihaiDing requested a review from stanley-cheung October 2, 2017 22:59

stanley-cheung reviewed Oct 2, 2017

View reviewed changes

grpc deleted a comment from grpc-testing Oct 5, 2017

This was referenced Oct 6, 2017

failed: run_tests_grpc-node_macos_opt_native failing in 100% cases #12857

Closed

test failed: run_tests_c_windows_dbg_native #12858

Closed

failed: run_tests_c_linux_tsan_native #12859

Closed

jtattermusch reviewed Oct 6, 2017

View reviewed changes

jtattermusch approved these changes Oct 6, 2017

View reviewed changes

vjpai approved these changes Oct 6, 2017

View reviewed changes

exclude uploading stats, add unconstained php benchmark

d015389

php7 build once

ZhouyihaiDing mentioned this pull request Oct 7, 2017

failed tests: php_linux_dbg_native:ChannelTest.php #12886

Closed

Conversation

ZhouyihaiDing commented Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vjpai Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vjpai Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vjpai Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZhouyihaiDing commented Oct 2, 2017

Uh oh!

ZhouyihaiDing commented Oct 2, 2017

Uh oh!

ZhouyihaiDing commented Oct 2, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grpc-testing commented Oct 5, 2017

Uh oh!

grpc-testing commented Oct 5, 2017

Uh oh!

grpc-testing commented Oct 5, 2017

Uh oh!

grpc-testing commented Oct 5, 2017

Uh oh!

grpc-testing commented Oct 5, 2017

Uh oh!

grpc-testing commented Oct 5, 2017

Uh oh!

ZhouyihaiDing commented Oct 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grpc-testing commented Oct 6, 2017

Uh oh!

jtattermusch left a comment

Choose a reason for hiding this comment

Uh oh!

grpc-testing commented Oct 6, 2017

Uh oh!

grpc-testing commented Oct 6, 2017

Uh oh!

grpc-testing commented Oct 6, 2017

Uh oh!

grpc-testing commented Oct 6, 2017

ZhouyihaiDing commented Sep 28, 2017 •

edited

Loading

vjpai Sep 28, 2017 •

edited

Loading

vjpai Sep 29, 2017 •

edited

Loading

vjpai Sep 29, 2017 •

edited

Loading

vjpai left a comment •

edited

Loading

ZhouyihaiDing commented Oct 6, 2017 •

edited

Loading