Skip to content

Run a local DNS server in run_tests and add c-ares tests with it#12210

Merged
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:local_dns_server
Sep 15, 2017
Merged

Run a local DNS server in run_tests and add c-ares tests with it#12210
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:local_dns_server

Conversation

@apolcyn
Copy link
Copy Markdown
Contributor

@apolcyn apolcyn commented Aug 16, 2017

Starts running a local DNS server (using dnslib) in the same process as the port server. It also adds a revised version of the c-ares test in #12010 to run against it, and fixes a couple of memory leaks within the grpc_cares_wrapper.c.

This keeps resolver_test_record_groups.yaml as the source of truth for all test records.

To run the tests added here:
tools/run_tests/run_tests.py -l c++ -r resolver_component_test

Planning on creating a follow-up PR that uses what's added here for running GCE DNS integration tests (as a revised version of #12010)

-------------- update: ----------------

  • fixed a bug in that the this PR's port server's /quitquitquit handler would still leave the port server process running. On shared resource workers including the jenkins mac and jenkins windows workers, this bug brought up a larger issue with this PR, in that changes to the set of DNS test records must always be backwards compatible as long as tests are sharing resources (IIC while there are still mac/windows tests on jenkins)

(note all workers right now are ok; non-docker workers failed to start the faulty port server in this PR because dnslib wasn't present)

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! Most of my comments are minor, but there are a few substantive ones. Please let me know if you have any questions about any of this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS doesn't allow individual TXT records larger than 255 characters, but it does allow you to split up a TXT record into multiple strings. For example:

foo.com. IN TXT "grpc_config=[{\"serviceConfig\":{" "\"loadBalancingPolicy\":\"round_robin\"}}]"

I don't know the protocol details, but I do know that DNS can somehow represent the difference between multiple independent TXT records and a single split-up TXT record. For example, if I add a second TXT record to the zone file in addition to the above, dig shows the following:

roth@segfault:~/grpc> dig @127.0.0.1 -p 1234 foo.com. txt
; <<>> DiG 9.9.5-3ubuntu0.15-Ubuntu <<>> @127.0.0.1 -p 1234 foo.com. txt
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60627
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 1, ADDITIONAL: 2
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;foo.com.                       IN      TXT
;; ANSWER SECTION:
foo.com.                3600    IN      TXT     "a=12345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
0123456789012345678901234567890123456789012345678901234567890"
foo.com.                3600    IN      TXT     "grpc_config=[{\"serviceConfig\"
:{" "\"loadBalancingPolicy\":\"round_robin\"}}]"
;; AUTHORITY SECTION:
foo.com.                3600    IN      NS      ns.foo.com.
;; ADDITIONAL SECTION:
ns.foo.com.             3600    IN      A       127.0.0.1
;; Query time: 0 msec
;; SERVER: 127.0.0.1#1234(127.0.0.1)
;; WHEN: Thu Aug 17 08:09:58 PDT 2017
;; MSG SIZE  rcvd: 417

Note that the two parts of the service config string are shown as a single split-up entry, but the other TXT record is shown as an independent record.

Note that because the split-up record appears differently at the protocol level, the c-ares code has special handling for this case. So we do need to find a way to test this.

From a quick glance at the dnslib code, I'm not sure it can handle this right now. If not, perhaps we can fix it to do that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also make sure to test the case where the sum total of the TXT records is larger than 512 bytes, because that will cause DNS to fall back from UDP to TCP. (I was having some problems testing this with the service config code, so this may not work right yet.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marked the UDP -> TCP test case as a TODO.

from offline chat, this needs some change to the ares wrapper code to work.

The problem that's currently happening is:

  1. c-ares receives a truncated DNS response and re-sends the query over TCP
  2. c-ares now signals interest in the TCP socket and the previous UDP socket (sometimes needed), and we still register notify_on_read on the UDP socket
  3. fd_orphan fails because the callback is still pending

I think we need to "maybe cancel" the callback on all sockets to cleanup - cc @zyc

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to tag @y-zeng here.

Yuchen, do you have time to look at this? We don't need to block this PR on it, but I think we will need to fix this before we can make c-ares the default DNS resolver.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++, instead of using static, it's cleaner to put things in an anonymous namespace.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, moved everything into an anonymous namespace

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is closing the testing namespace, not the anonymous namespace.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/e a/expected addresses/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can return bool.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got rid of the whole matches_any function

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking that the value is present in expected_addrs isn't quite precise enough. If the expected addresses are (A, B) and the actual addresses are (A, A), the test will not fail as it should.

Since we're using C++ here, can we use some of the GMock constructs to make this more readable? For example, if we can get both the expected and actual addresses into vectors of grpc_lb_address objects, perhaps we can use something like UnorderedElementsAreArray to concisely check the result?

Copy link
Copy Markdown
Contributor Author

@apolcyn apolcyn Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some trouble getting the gmock construct to work - I'm not certain why but I think think that it's because this test isn't following a needed framework (running as an ordinary binary, and not with RUN_ALL_TESTS and TEST_ test cases).

Can look into this further if we want to use that though

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why we can't structure this like any other C++ test, where the individual tests are defined using the TEST() macros and invoked via RUN_ALL_TESTS().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the service config specifies the LB policy, then we should also check that the resolver has set GRPC_ARG_LB_POLICY_NAME.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I'd like to check offline though that the expected lb policy names that I have set (which are the ones appearing), are correct. E.g., my understanding is that if an SRV address is resolved, then grpclb should be used - but it appears that the policy name channel arg is actually null in that case

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. The resolver's behavior is that it will only set GRPC_ARG_LB_POLICY_NAME if the service config explicitly specifies an LB policy, otherwise it is left unset. When the resolver returns its results to the client_channel code, that's where we check whether any of the addresses are balancer addresses and if so, force the use of grpclb regardless of what policy (if any) was present in the service config.

https://github.com/grpc/grpc/blob/master/src/core/ext/filters/client_channel/client_channel.c#L389

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want case-insensitive comparison here -- the JSON data is compared in a case-sensitive way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not actually an end-to-end test. Can we rename it to something like resolver_test? (Please update function names, file names, etc).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this to resolver_component_test...

I wanted make it more unique than resolver_test, for namespacing/regex matching purposes, but can change if we want to

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Aug 23, 2017

tried to address all comments, PTAL

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. Just a few issues remaining.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just say return test_case['expected_lb_policy'] or ''.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the ? true : false part.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use C++ style names in this test -- e.g., ConvertStringToBool() instead of convert_string_to_bool().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted some but not all, I was a little unsure here. Kept types used by C funcs (e.g., args_struct), and functions called back from C, as C-style, and converted the top-level to C++

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is written in C++, so I think it's fine to use C++ style even for things that are used as C callbacks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please declare variables right before they are used, rather than declaring them all at once at the top of a function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest calling this is_balancer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. The resolver's behavior is that it will only set GRPC_ARG_LB_POLICY_NAME if the service config explicitly specifies an LB policy, otherwise it is left unset. When the resolver returns its results to the client_channel code, that's where we check whether any of the addresses are balancer addresses and if so, force the use of grpclb regardless of what policy (if any) was present in the service config.

https://github.com/grpc/grpc/blob/master/src/core/ext/filters/client_channel/client_channel.c#L389

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having an expected_config_index attribute, I suggest having a expected_service_config attribute that contains the expected string. This is because when the TXT record contains multiple choices, the resolver is responsible for picking one of those choices. The service config it returns should include only the selected choice -- i.e., it's a subset of the TXT record.

For example, one of the tests you've defined has the following TXT record:

grpc_config=[
  {"percentage":0,
   "serviceConfig":{"loadBalancingPolicy":"round_robin","methodConfig":[{"name":[{"method":"Foo","service":"NeverPickedService","waitForReady":true}]}]}},
  {"percentage":100,
   "serviceConfig":{"loadBalancingPolicy":"round_robin","methodConfig":[{"name":[{"method":"Foo","service":"AlwaysPickedService","waitForReady":true}]}]}}]

In this case, the resolver should return only the second service config:

{"loadBalancingPolicy":"round_robin","methodConfig":[{"name":[{"method":"Foo","service":"AlwaysPickedService","waitForReady":true}]}]}}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - previously string found in python, that change simplified the code

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why we can't structure this like any other C++ test, where the individual tests are defined using the TEST() macros and invoked via RUN_ALL_TESTS().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The format might be a bit less ambiguous if we used a different delimiter character for the (address,is_balancer) pairs than between the pairs. But this is fine as-is too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to tag @y-zeng here.

Yuchen, do you have time to look at this? We don't need to block this PR on it, but I think we will need to fix this before we can make c-ares the default DNS resolver.

@apolcyn apolcyn requested a review from jtattermusch August 23, 2017 17:29
@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Aug 24, 2017

tried to address all comments again, also renamed records

PTAL

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Aug 24, 2017

cc @jtattermusch for how this is tying in with run_tests.py and the port server.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think port_server.py also starting a DNS server as a side effect is a pretty dirty solution (no one would expect such behavior from something called port_server). Also, port server is started in other circumstances - like in performance benchmarks, where you don't need the DNS server at all. I think overall this solution will lead to problems in the future.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolasnoble recommended this solution, since we need a hermetic DNS server in the tests. I suppose we could have the tests run yet another binary, but that seems unnecessarily complicated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't say how the is_balancer bit is encoded.

Also, the parens seem a little cumbersome. I was thinking of something more like "address0,is_balancer0;address1,is_balancer1;..." -- i.e., use comma between the two elements of each pair and semicolon between the pairs. Or perhaps "address0=is_balancer0,address1=is_balancer1,...".

Copy link
Copy Markdown
Member

@markdroth markdroth Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please eliminate unnecessary blank lines within functions.

https://google.github.io/styleguide/cppguide.html#Vertical_Whitespace

In general, I find that if you really need to separate code into logical "chunks", it's better to use a comment line documenting the purpose of the next section than a blank line.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to address this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you save this as a C++ string instead of a C string, then you can change check_service_config_result_locked() to just use EXPECT_EQ().

(Actually, even if you keep it as a C string, you can use EXPECT_STREQ().)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolasnoble recommended this solution, since we need a hermetic DNS server in the tests. I suppose we could have the tests run yet another binary, but that seems unnecessarily complicated.

@markdroth
Copy link
Copy Markdown
Member

BTW, I meant to say, this is looking really good. I think the remaining issues are mostly cosmetic.

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Aug 24, 2017

After some chat, I am thinking that I will probably restructure how the port server is started: instead of starting one global DNS server along side the port server, a subprocess DNS server, with a dynamically assigned port, can be spawned per-test.

Doing that would isolate the DNS server from tests that don't need it, prevent need of changes to the port server, and also help in making changes to the test cases here (backwards compatibility of the test records is no longer a big deal since the DNS server is never shared across tests).

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Aug 25, 2017

Addressed comments in most recent commit, and changed the DNS server to launch per-test run, when ran with run_tests.py. Note if running the test manually, we can still not spawn a sub-process and instead specify exiting server.

One thing that is significant here though is the race between the sub-process standing up it's server and the c-ares DNS query beginning - I found that a 1-second sleep between starting the server and beginning the query was needed to prevent failures.

With that sleep:

  • if using epollsig, the DNS query will time out.
  • with other poll engines, the query fails fast.

I'm not sure right now if the race there is something that can (or should) be solved with c-ares retry config, or if there is some unexpected behavior around this with the different pollers.

PTAL

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[trickle] No significant performance differences

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[microbenchmarks] Performance differences noted:
Benchmark                                                          cpu_time    real_time
-----------------------------------------------------------------  ----------  -----------
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/2M/2M        -12%        -13%
BM_UnaryPingPong<InProcessCHTTP2, NoOpMutator, NoOpMutator>/2M/2M  -4%         -5%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/2M/2M     -9%         -9%

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little concerned that having to start up a separate server for each test will make things more brittle, since there are a lot more moving parts here that need to be handled right. I'm not sure that all of the additional complexity is worth not having the DNS server in the port server binary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/of/or/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to address this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/:/,/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Cant/Can't/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving the flag checking down into main().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also log the pid, if there's an easy way to get at it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see an easy way to do this. Even with gpr_subprocess directly, the pid info appears to be platform-specific and not exposed. Using fork/exec directly I could, but I'd lean towards gpr_subprocess and its SubProcess (I think this wrapper is more portable too)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for == true.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for == true.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to remove the == true here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving this down into the main logic below.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the rest of this in a main() function, and invoke it only if __name__ == '__main__'.

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Aug 25, 2017

IMO I agree with the earlier push back against the DNS server running with the port server. While the DNS server per-test subprocess adds more complexity to the test, I'm worried that the DNS server running with the port server adds too much long-lived and global state to the overall test environment, with higher risk of infrastructure bugs and manual work.

The biggest problems I see with the DNS server in the port server are:

  • It would be important that any change to resolver_test_record_groups.yaml (or the DNS server), be backward compatible (we can never stop serving a previously exiting record without disabling the test temporarily or flagging it is expected-to-fail for a couple of days).

  • Any change to either the DNS server or the test records also needs come with an update to the version number of the port server.

  • Need to make sure that the dnslib dependency is present on all machines that are running non-docker tests using the port server, even if they're not using the DNS server.

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Aug 25, 2017

Is the void ExpectToken(const char *token, std::string expected_addrs) issue was fixed, since expected_addrs it's now passed-by-value?

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[trickle] No significant performance differences

@markdroth
Copy link
Copy Markdown
Member

Is the void ExpectToken(const char *token, std::string expected_addrs) issue was fixed, since expected_addrs it's now passed-by-value?

Yes, that's fixed. The thing that I said wasn't fixed was that there are still a bunch of C-style names that should be converted to C++-style.

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasoning for having the DNS server be separate from the portserver. I am willing to give it a shot, but we're going to have to monitor it closely to make sure it doesn't fail.

What do you think about writing a wrapper shell script to get the port for the DNS server, start the server, run dig, and then run the test program with --local_dns_server_address=localhost:${port}? The wrapper script could trap SIGTERM and kill both the test and the DNS server when it terminates. That would (a) make it a bit more robust for handling termination and (b) more cleanly separate the test from the infrastructure that we need to run it automatically.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to remove the == true here.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[microbenchmarks] No significant performance differences

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[trickle] No significant performance differences

1 similar comment
@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[trickle] No significant performance differences

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[microbenchmarks] Performance differences noted:
Benchmark                                                              nows_per_iteration    real_time
---------------------------------------------------------------------  --------------------  -----------
BM_PumpStreamServerToClient<SockPair>/2M                                                     +6%
BM_StreamingPingPong<InProcessCHTTP2, NoOpMutator, NoOpMutator>/16M/1  -5%


} // namespace grpc

DEFINE_bool(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put these flag definitions up at the top of the file, right under the includes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

gpr_cv_init(&test_driver_cv);
int test_driver_done = 0;
register_sighandler();
std::thread sig_handling_thread(RunSigHandlingThread, test_driver,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't realize that there wasn't a form of SubProcess::Join() that takes a timeout. In that case, I guess this is fine as-is. It's a shame that we need so much machinery to do something so simple, but I don't see an obviously better approach.


void RunSigHandlingThread(SubProcess *test_driver, gpr_mu *test_driver_mu,
gpr_cv *test_driver_cv, int *test_driver_done) {
for (size_t i = 0; i < kTestTimeoutSeconds && !abort_wait_for_child; i++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this would cause the response to SIGINIT / SIGTERM to be delayed. The overall mechanics of the loop would not be different -- we would still be blocking for no more than 1 second at a time.

What I'm thinking is something like this:

gpr_timespec deadline = gpr_time_add(
    gpr_now(GPR_CLOCK_MONOTONIC),
    gpr_time_from_seconds(kTestTimeoutSeconds, GPR_TIMESPAN));
while (true) {
  gpr_timespec now = gpr_now(GPR_CLOCK_MONOTONIC);
  if (gpr_time_cmp(now, deadline) <= 0) break;
  gpr_mu_lock(test_driver_mu);
  if (*test_driver_done) {
    gpr_mu_unlock(test_driver_mu);
    return;
  }
  gpr_timespan wait_deadline = gpr_time_min(
      deadline,
      gpr_time_add(now, gpr_time_from_seconds(1, GPR_TIMESPAN)));
  gpr_cv_wait(test_driver_cv, test_driver_mu, wait_deadline);
  gpr_mu_unlock(test_driver_mu);    
}

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[microbenchmarks] Performance differences noted:
Benchmark                                                      atm_add_per_iteration    nows_per_iteration
-------------------------------------------------------------  -----------------------  --------------------
BM_StreamingPingPongMsgs<MinTCP, NoOpMutator, NoOpMutator>/2M  -6%                      -4%
BM_UnaryPingPong<MinTCP, NoOpMutator, NoOpMutator>/2M/2M       -4%                      -6%

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[trickle] No significant performance differences

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[microbenchmarks] No significant performance differences

Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this!

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Sep 13, 2017

thanks for review!

FYI squashed the reviewed commits into the single commit with the message: "local dns server". Made a couple of changes, starting from "fix native asan failure in driver", to fix remaining found issues in tests.

@markdroth
Copy link
Copy Markdown
Member

Please squash all commits into one before merging. Thanks!

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Corrupt JSON data (indicates timeout or crash): 
    bm_fullstack_streaming_ping_pong.BM_StreamingPingPongWithCoalescingApi_MinInProcessCHTTP2_NoOpMutator_NoOpMutator__2M_2_0.counters.new: 1


[microbenchmarks] No significant performance differences

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[trickle] No significant performance differences

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[trickle] No significant performance differences

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

[microbenchmarks] No significant performance differences

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Sep 14, 2017

test failures all look unrelated (will squash again before merging)

@markdroth
Copy link
Copy Markdown
Member

I think you still need approval from @nicolasnoble before you can merge.

@nicolasnoble
Copy link
Copy Markdown
Contributor

Ah, right, OWNERS and everything.

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Sep 15, 2017

test failures:

tests.cloud_to_prod:default:java:cancel_after_first_response
tests.cloud_to_prod:default:javaokhttp:large_unary
tests.cloud_to_prod:default:php:server_streaming
tests.cloud_to_prod:default:java:server_streaming

@apolcyn apolcyn merged commit 39e4fda into grpc:master Sep 15, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants