Skip to content

Conversation

@promag
Copy link
Contributor

@promag promag commented Feb 7, 2019

Call evhttp_free in the event loop to trigger server shutdown, in particular to close connections.

@promag
Copy link
Contributor Author

promag commented Feb 7, 2019

@sdaftuar regarding #15305, if you cherry pick commit to your branch you'll see the test runs in a couple of seconds.

Also, with the following patch:

--- a/src/httpserver.cpp
+++ b/src/httpserver.cpp
@@ -465,6 +465,7 @@ void StopHTTPServer()
     boundSockets.clear();
     if (eventBase) {
         LogPrint(BCLog::HTTP, "Waiting for HTTP event thread to exit\n");
+        event_base_dump_events(eventBase, stderr);
         // Exit the event loop as soon as there are no active events.
         event_base_loopexit(eventBase, nullptr);
         threadHTTP.join();

and running

test/functional/feature_abortnode.py --nocleanup

you will see (mind the path):

cat /var/folders/1v/8_69hby54nj2k3n6fywt44x80000gn/T/bitcoin_func_test_o4hopx4l/node0/stderr/tmp*
: Corrupted block database detected.
Please restart with -reindex or -reindex-chainstate to recover.
Inserted events:
Active events:
Error: Error: A fatal internal error occurred, see debug.log for details
Inserted events:
  0x7fb93d44b560 [fd  14] Read Persist Timeout=1549549849.031699
Active events:

@promag
Copy link
Contributor Author

promag commented Feb 7, 2019

Improves #15309.

@promag promag force-pushed the 2019-01-loopexit branch 2 times, most recently from edd9829 to a0f6915 Compare February 7, 2019 15:23
@promag
Copy link
Contributor Author

promag commented Feb 7, 2019

#15350 was merged, updated OP.

@laanwj
Copy link
Member

laanwj commented Feb 8, 2019

We just can't keep "fixing" this, can we?

@promag promag changed the title http: Exit the event loop as soon as there are no active events wip: http: Exit the event loop as soon as there are no active events Feb 8, 2019
@promag
Copy link
Contributor Author

promag commented Feb 8, 2019

This is a different issue, idle connections are stalling shutdown. We could decrease the timeout duration but I don't think that's a fix.

This is incomplete.

@ryanofsky
Copy link
Contributor

We just can't keep "fixing" this, can we?

The libevent HTTP interface is kind of crazy, so I don't think it's surprising that fixing this requires some trial and error. But it would be good to have tests for each issue that gets fixed to avoid going in circles.

If it would be possible to specify what should happen on shutdown for connections in different states (close headers? truncated responses? timeouts?) and write a more comprehensive client test that would of course be great, too.

@promag
Copy link
Contributor Author

promag commented Feb 8, 2019

@ryanofsky agree, and test/functional/feature_shutdown.py was added with that in mind. I'll add test for this change.

BTW, I think the case introduced by @sdaftuar in #15305 is new because the shutdown isn't triggered by stop command.

@maflcko maflcko modified the milestone: 0.18.0 Feb 13, 2019
@maflcko
Copy link
Member

maflcko commented Feb 18, 2019

Not sure if we should pull this into 0.18 at this point. Removing for now

@promag
Copy link
Contributor Author

promag commented Feb 18, 2019

@MarcoFalke sure I agree. The issue is on shutdown so it shouldn't bother too much to wait for connections to timeout - beside our functional tests.

@promag promag force-pushed the 2019-01-loopexit branch from a0f6915 to a856a30 Compare March 11, 2019 12:01
@maflcko
Copy link
Member

maflcko commented Jul 29, 2019

What is the status of this?

It has nothing to do with #16405, right?

@promag
Copy link
Contributor Author

promag commented Jul 31, 2019

It has nothing to do with #16405, right?

@MarcoFalke no I mean right. Now that #15305 is merged, I'll pick this again.

@promag
Copy link
Contributor Author

promag commented Aug 2, 2019

Currently in master 3a3d8b8

test/functional/feature_abortnode.py --nocleanup

2019-08-02T17:31:41.878000Z TestFramework (INFO): Initializing test directory /var/folders/dc/zhlby57d0vb9yrw5chl5wx6h0000gn/T/bitcoin_func_test_prjmeyz1
2019-08-02T17:31:42.898000Z TestFramework (INFO): Waiting for crash
2019-08-02T17:32:13.050000Z TestFramework (INFO): Node crashed - now verifying restart fails
2019-08-02T17:32:13.359000Z TestFramework (INFO): Stopping nodes
2019-08-02T17:32:13.725000Z TestFramework (WARNING): Not cleaning up dir /var/folders/dc/zhlby57d0vb9yrw5chl5wx6h0000gn/T/bitcoin_func_test_prjmeyz1
2019-08-02T17:32:13.725000Z TestFramework (INFO): Tests successful

It takes 30 seconds to timeout and only then the event base quits.

This PR fixes it but IMO in a wrong way. I will update shortly.

@promag promag force-pushed the 2019-01-loopexit branch 2 times, most recently from c51a4ed to 79da6bd Compare August 3, 2019 01:53
@Sjors
Copy link
Member

Sjors commented Aug 14, 2019

When I run the functional test suite on a RAM disk it's bottlenecked by the slowest running tests. I looked at some of those tests and saw they were stuck on Waiting for HTTP event thread to exit. This also happens without --use-cli, e.g. for wallet_multiwallet.py.

Cherry-picking this commit speeds up wallet_multiwallet.py --usecli from 60-120s to 40s on my machine.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 18, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@promag promag changed the title wip: http: Exit the event loop as soon as there are no active events http: Exit the event loop as soon as there are no active events Jun 29, 2020
@promag promag force-pushed the 2019-01-loopexit branch from 79da6bd to f6064f9 Compare June 29, 2020 22:23
@promag promag changed the title http: Exit the event loop as soon as there are no active events http: Relese server before waiting for event base loop exit Jun 29, 2020
@promag promag force-pushed the 2019-01-loopexit branch from 75c6b76 to d044b7b Compare June 30, 2020 10:35
@promag
Copy link
Contributor Author

promag commented Jun 30, 2020

Now test/functional/feature_abortnode.py runs pretty quick, it doesn't have to wait for timeouts.

@maflcko maflcko changed the title http: Relese server before waiting for event base loop exit http: Release server before waiting for event base loop exit Jun 30, 2020
@promag
Copy link
Contributor Author

promag commented Jul 1, 2020

For now closing in favor of #19420.

@promag promag closed this Jul 1, 2020
@promag promag deleted the 2019-01-loopexit branch July 1, 2020 11:13
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants