-
Notifications
You must be signed in to change notification settings - Fork 38.7k
test: fix dangling bitcoind in functional tests #19281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jonatack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. A couple nits below, but no need to change anything until other reviewers agree with this.
jonatack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. A couple nits below, but no need to change anything until other reviewers agree with this.
…acOS 3a7e794 test: retry when write to a socket fails on macOS (Ivan Metlushko) 8cf9d15 test: use pgrep for better compatibility (Ivan Metlushko) Pull request description: Rationale: a few minor changes to make experience of running tests on macOS a bit better 1.`pidof` is not available on BSD/macOS, while `pgrep` is present on BSD, Linux and macOS 2. Add retry as a workaround for a weird behavior when writing to a socket (https://bugs.python.org/issue33450). Stacktrace attached Man pages: https://www.freebsd.org/cgi/man.cgi?query=pgrep&apropos=0&sektion=1&manpath=FreeBSD+6.0-RELEASE&arch=default&format=html https://man7.org/linux/man-pages/man1/pgrep.1.html Related to #19281 Stacktrace example: ``` ... 33/161 - feature_abortnode.py failed, Duration: 63 s stdout: 2020-06-11T10:46:43.947000Z TestFramework (INFO): Initializing test directory /var/folders/2q/d5w9zh614r7g5c8r74ln3g400000gq/T/test_runner_₿_🏃_20200611_174102/feature_abortnode_128 2020-06-11T10:46:45.199000Z TestFramework (INFO): Waiting for crash 2020-06-11T10:47:15.921000Z TestFramework (INFO): Node crashed - now verifying restart fails 2020-06-11T10:47:47.068000Z TestFramework (INFO): Stopping nodes [node 1] Cleaning up leftover process stderr: Traceback (most recent call last): File "/Users/xxx/Projects/bitcoin/test/functional/feature_abortnode.py", line 50, in <module> AbortNodeTest().main() File "/Users/xxx/Projects/bitcoin/test/functional/test_framework/test_framework.py", line 142, in main exit_code = self.shutdown() File "/Users/xxx/Projects/bitcoin/test/functional/test_framework/test_framework.py", line 266, in shutdown self.stop_nodes() File "/Users/xxx/Projects/bitcoin/test/functional/test_framework/test_framework.py", line 515, in stop_nodes node.stop_node(wait=wait) File "/Users/xxx/Projects/bitcoin/test/functional/test_framework/test_node.py", line 318, in stop_node self.stop(wait=wait) File "/Users/xxx/Projects/bitcoin/test/functional/test_framework/coverage.py", line 47, in __call__ return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs) File "/Users/xxx/Projects/bitcoin/test/functional/test_framework/authproxy.py", line 142, in __call__ response, status = self._request('POST', self.__url.path, postdata.encode('utf-8')) File "/Users/xxx/Projects/bitcoin/test/functional/test_framework/authproxy.py", line 107, in _request self.__conn.request(method, path, postdata, headers) File "/Users/xxx/.pyenv/versions/3.5.6/lib/python3.5/http/client.py", line 1107, in request self._send_request(method, url, body, headers) File "/Users/xxx/.pyenv/versions/3.5.6/lib/python3.5/http/client.py", line 1152, in _send_request self.endheaders(body) File "/Users/xxx/.pyenv/versions/3.5.6/lib/python3.5/http/client.py", line 1103, in endheaders self._send_output(message_body) File "/Users/xxx/.pyenv/versions/3.5.6/lib/python3.5/http/client.py", line 936, in _send_output self.send(message_body) File "/Users/xxx/.pyenv/versions/3.5.6/lib/python3.5/http/client.py", line 908, in send self.sock.sendall(data) OSError: [Errno 41] Protocol wrong type for socket ``` ACKs for top commit: laanwj: ACK 3a7e794 Tree-SHA512: fefbe40ce94ab29f18bbbed2a434194b1384ffa5279b1d04db7a3708e3dd422bd9e450f1db3f95a1a851fac5a626ab533c6ebcfd7ede96f8ccae9e6f3e9fff92
Test runner sends SIGTERM instead of SIGKILL to a running test to allow proper shutdown of nodes.
2709209 to
bb44fd9
Compare
|
Fixed nits |
|
Concept ACK |
|
ACK bb44fd9
Tested on OSX 11.1 |
|
ACK bb44fd9 (rebased on 3ac5209).
|
|
I think |
|
@MarcoFalke would you concept ack this if I add a timeout to |
|
There shouldn't be any timeout. The whole process tree should be killed immediately. If there was a simple way in python to do that, I'd prefer that. |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
…en using `--failfast` 451b96f test: kill process group to avoid dangling processes (S3RK) Pull request description: This is an alternative to #19281 This PR fixes a problem when after test failure with `--failfast` option there could be dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures. If there are any dangling nodes left at the end of the test run we kill the whole process group. Pros: the operations is immediate and won't lead to CI timeout Cons: the test_runner process is also killed and exit code is 137 Example output: ``` ... Early exiting after test failure TEST | STATUS | DURATION rpc_decodescript.py | ✓ Passed | 2 s rpc_deprecated.py | ✓ Passed | 2 s rpc_deriveaddresses.py | ✓ Passed | 2 s rpc_dumptxoutset.py | ✖ Failed | 2 s ALL | ✖ Failed | 8 s (accumulated) Runtime: 4 s Killed: 9 > echo $? 137 ``` ACKs for top commit: MarcoFalke: review ACK 451b96f aitorjs: ACK 451b96f. Manual testing with and without **--failfast**. Tree-SHA512: 87e510a1411b9e7571e63cf7ffc8b9a8935daf9112ffc0f069d6c406ba87743ec439808181f7e13cb97bb200fad528589786c47f0b43cf3a2ef0d06a23cb86dd
|
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
…sses when using `--failfast` 451b96f test: kill process group to avoid dangling processes (S3RK) Pull request description: This is an alternative to bitcoin#19281 This PR fixes a problem when after test failure with `--failfast` option there could be dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures. If there are any dangling nodes left at the end of the test run we kill the whole process group. Pros: the operations is immediate and won't lead to CI timeout Cons: the test_runner process is also killed and exit code is 137 Example output: ``` ... Early exiting after test failure TEST | STATUS | DURATION rpc_decodescript.py | ✓ Passed | 2 s rpc_deprecated.py | ✓ Passed | 2 s rpc_deriveaddresses.py | ✓ Passed | 2 s rpc_dumptxoutset.py | ✖ Failed | 2 s ALL | ✖ Failed | 8 s (accumulated) Runtime: 4 s Killed: 9 > echo $? 137 ``` ACKs for top commit: MarcoFalke: review ACK 451b96f aitorjs: ACK 451b96f. Manual testing with and without **--failfast**. Tree-SHA512: 87e510a1411b9e7571e63cf7ffc8b9a8935daf9112ffc0f069d6c406ba87743ec439808181f7e13cb97bb200fad528589786c47f0b43cf3a2ef0d06a23cb86dd

This PR fixes a problem when after failure of a test with
--failfastoption there are dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures.The fix: Introduce new test status —
cancelledand register a signal handler that will raise TestCancelled exception to properly handle termination. Test runner now sends SIGTERM instead of SIGKILL.