http: speed up shutdown #6990

laanwj · 2015-11-11T17:12:13Z

Pardon a bit of iteration. This continues/fixes #6719.

event_base_loopexit was not doing what I expected it to. What I expected was that it sets a timeout, given that no other pending events it would exit in N seconds. However, what it does was delay the event loop exit with N seconds, even if nothing is pending.

Solve it in a different way: give the event loop thread time to exit out of itself, and if it doesn't, send loopbreak (and then join it).

This speeds up the RPC tests a lot, each exit incurred a 10 second overhead, with this change there should be no shutdown overhead in the common case and up to two seconds if the event loop is blocking.

As a bonus this breaks dependency on the global boost::thread_group, as the HTTP server minds its own offspring.

Time for rpctests with this patch:

real    7m50.215s
user    0m30.152s
sys     0m11.752s

Without:

real    13m12.069s
user    0m30.412s
sys     0m11.168s

dcousens · 2015-11-11T21:01:40Z

event_base_loopbreak was not doing what I expected it to. What I expected was that it sets a timeout, given that no other pending events it would exit in N seconds. However, what it does was delay the event loop exit with N seconds, even if nothing is pending.

~~Sounds like you wanted event_base_loopexit? Ref:~~

The next event_base_loop() iteration after the given timer expires will complete normally (handling all queued events) then exit without blocking for events again.

~~In any case, event_base_loopbreak should still exit immediately, stopping any loops after the next event:~~

event_base_loop() will abort the loop after the next event is completed; event_base_loopbreak() is typically invoked from this event's callback. This behavior is analogous to the "break;" statement.

Are you sure this isn't symptomatic of something else?

Reference: http://www.wangafu.net/~nickm/libevent-2.0/doxygen/html/event_8h.html#a07a7599e478e4031fa8cf52e26d8aa1e

edit: Right, never mind, you were using event_base_loopexit, hence the confusion. You're OP (before it was edited) makes it sound like you switched from event_base_loopbreak.

dcousens · 2015-11-11T21:06:00Z

src/httpserver.cpp

Out of interest, why do we have to wait [a hard coded timeout]? Why can't we wait for all events to finish (and maybe just ignore all new incoming connections?)

all events to finish

may take a long time (minutes) depending what was going in in the past?

@MarcoFalke but this doesn't stop new connections (or is that done elsewhere?) in that waiting time. So conceptually, the status quo could be the same?
That is, new connections coming in, old ones still finishing.

New connections are stopped just above, see where the accept sockets are torn down.

Though it does look like we should close connected sockets in http_reject_request_cb as well for good measure. @laanwj Is there a reason to keep them open once they've had a req rejected?

Waiting for all events to finish could take forever. There's tons of ways that someone could keep a connection option by slowly sending data. Also, timing events (such as the walletpassphrase timeout) may hold up the event queue.

Just breaking off everything in progress immediately prevents the answer to stop from getting back. We could just declare this a feature of course and be done with it... But it was a race condition, sometimes you would get a reply, sometimes not. We could define stop to cease all RPC activity immediately. But I think I like this better.

There AFAIK is no way to ask libevent, nor libevent_http if the events in the queue are important enough to wait for or not. And specially tagging the connection that sends 'stop' is quite ugly, not sure if even possible without diving into internal data structures.

For context read #6719.

I'm not 100% happy with this solution either, but I wasn't able to find another way to do it, and it's better than the previous one.

@cfields sure, you can close the connection afterwards there, but that doesn't solve this issue.

@MarcoFalke but this doesn't stop new connections (or is that done elsewhere?) in that waiting time. So conceptually, the status quo could be the same?

It does:

// Unlisten sockets BOOST_FOREACH (evhttp_bound_socket *socket, boundSockets) { evhttp_del_accept_socket(eventHTTP, socket); } // Reject requests on current connections evhttp_set_gencb(eventHTTP, http_reject_request_cb, NULL);

This closes the port and makes new requests error (so that no new work items are created). But that's best-effort. Current connections could still do e.g. a slowloris attack. The two seconds are a belt and suspenders, nothing more.

dcousens · 2015-11-11T21:06:46Z

utACK

jonasschnelli · 2015-11-12T07:46:07Z

utACK.

laanwj · 2015-11-12T08:48:18Z

Some more information: it appears that with libevent master, this issue doesn't exist.
event_base_loopexit does actually do what I expect it to in the OP. As that is the version I'm normally using, that is why I didn't notice before.

Edit: so I think we should still do this for backward compat, but at least I'm not crazy :)

dcousens · 2015-11-13T01:00:41Z

@laanwj maybe just inline a comment to mention this might be fixed in latest version of libevent? Could be useful for some future refactoring, save someone trying to establish whether there was a reason we moved away from it that they aren't aware of.

laanwj · 2015-11-13T09:14:04Z

Will do

Edit: OK, added comment and updated commit message.

laanwj · 2015-11-13T09:14:58Z

Huh! I pushed this branch to bitcoin/bitcoin instead of my own repository. Oh well, no harm done, I'll just remove it again when it's merged.

This continues/fixes #6719. `event_base_loopbreak` was not doing what I expected it to, at least in libevent 2.0.21. What I expected was that it sets a timeout, given that no other pending events it would exit in N seconds. However, what it does was delay the event loop exit with 10 seconds, even if nothing is pending. Solve it in a different way: give the event loop thread time to exit out of itself, and if it doesn't, send loopbreak. This speeds up the RPC tests a lot, each exit incurred a 10 second overhead, with this change there should be no shutdown overhead in the common case and up to two seconds if the event loop is blocking. As a bonus this breaks dependency on boost::thread_group, as the HTTP server minds its own offspring.

laanwj · 2015-11-13T11:24:38Z

~~@theuni can you take a look here as well?~~
Edit: nm, you already did, but didn't ack

gmaxwell · 2015-11-13T19:32:03Z

ACK.

a264c32 http: speed up shutdown (Wladimir J. van der Laan)

dcousens · 2015-11-14T08:51:27Z

Awesome, thanks @laanwj :)

maflcko · 2015-11-15T13:05:52Z

Post merge tested ACK!

@laanwj this makes travis a lot faster. (runtime reduced for some rpc tests by 90%)

Before:

$ time for i in {1..10};do qa/pull-tester/rpc-tests.py disablewallet;done
real    1m53.003s
user    0m1.135s
sys 0m0.437s

$ time for i in {1..10};do qa/pull-tester/rpc-tests.py wallet;done
real    10m4.888s
user    0m21.829s
sys 0m8.239s

After:

disablewallet:
real    0m13.809s
user    0m1.166s
sys 0m0.510s

wallet:
real    5m7.703s
user    0m22.128s
sys 0m8.222s

Fix various thread assertion errors caused during shutdown. Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6719 - bitcoin/bitcoin#6990 - bitcoin/bitcoin#8421 - Second commit only in this PR - bitcoin/bitcoin#11006 I've cherry-picked the relevant commits, along with a note in each commit referring to the original Bitcoin commit ID (and the Zcash issue numbers where applicable). I've tested each issue with/without these patches applied. Closes #2214, #2334, and #2554.

HTTP Server cherries from Core: bitcoin/bitcoin#6719 - Make HTTP server shutdown more graceful bitcoin/bitcoin#6859 - http: Restrict maximum size of http + headers bitcoin/bitcoin#6990 - http: speed up shutdown bitcoin/bitcoin#7966 - http: Do a pending c++11 simplification handling work items bitcoin/bitcoin#8421 - httpserver: drop boost (#8023 dependency) bitcoin/bitcoin#11006 - Improve shutdown process

laanwj added the RPC/REST/ZMQ label Nov 11, 2015

dcousens reviewed Nov 11, 2015
View reviewed changes

laanwj force-pushed the 2015_11_threadexit branch 2 times, most recently from 1ec9198 to febcc13 Compare November 13, 2015 09:31

laanwj force-pushed the 2015_11_threadexit branch from febcc13 to a264c32 Compare November 13, 2015 10:10

gmaxwell merged commit a264c32 into master Nov 13, 2015

gmaxwell added a commit that referenced this pull request Nov 13, 2015

Merge pull request #6990

dbd2c13

a264c32 http: speed up shutdown (Wladimir J. van der Laan)

maflcko mentioned this pull request Nov 16, 2015

errors compiling httpserver.cpp #7031

Closed

laanwj deleted the 2015_11_threadexit branch November 30, 2015 15:02

jnewbery mentioned this pull request Aug 8, 2017

Improve shutdown process #11006

Merged

jasondavies mentioned this pull request Oct 23, 2017

Fix various thread assertion errors caused during shutdown. zcash/zcash#2555

Merged

dagurval mentioned this pull request Jan 27, 2018

HTTP Server cherries bitcoinxt/bitcoinxt#315

Merged

sickpig mentioned this pull request Mar 12, 2018

HTTP servers Core ports BitcoinUnlimited/BitcoinUnlimited#1005

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

http: speed up shutdown #6990

http: speed up shutdown #6990

Uh oh!

Conversation

laanwj commented Nov 11, 2015

Uh oh!

dcousens commented Nov 11, 2015

Uh oh!

dcousens Nov 11, 2015

Choose a reason for hiding this comment

Uh oh!

maflcko Nov 11, 2015

Choose a reason for hiding this comment

Uh oh!

dcousens Nov 11, 2015

Choose a reason for hiding this comment

Uh oh!

theuni Nov 12, 2015

Choose a reason for hiding this comment

Uh oh!

laanwj Nov 12, 2015

Choose a reason for hiding this comment

Uh oh!

laanwj Nov 12, 2015

Choose a reason for hiding this comment

Uh oh!

dcousens commented Nov 11, 2015

Uh oh!

jonasschnelli commented Nov 12, 2015

Uh oh!

laanwj commented Nov 12, 2015

Uh oh!

dcousens commented Nov 13, 2015

Uh oh!

laanwj commented Nov 13, 2015

Uh oh!

laanwj commented Nov 13, 2015

Uh oh!

laanwj commented Nov 13, 2015

Uh oh!

gmaxwell commented Nov 13, 2015

Uh oh!

dcousens commented Nov 14, 2015

Uh oh!

maflcko commented Nov 15, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants