-
Notifications
You must be signed in to change notification settings - Fork 38.7k
test: kill process group to avoid dangling processes when using --failfast
#22249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means --failfast no longer works on Windows, but who uses that anyway 🤷♂️
5012bf0 to
451b96f
Compare
|
I can add different code paths for Win and Unix if desirable |
|
review ACK 451b96f mind pushing a temporary commit with a logic error in one of the tests, so that we can observe the CI fails properly? |
|
Concept ACK on addressing the annoying dangling |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
promag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what is wrong with the kill+wait approach, why does it result in dangling processes.
So, we have three layers of processes, something like Currently To avoid that we can do SIGTERM and wait, but that could lead to CI timeouts. I've tried that first in #19281 Another alternative is to setup signal handler in child process (bitcoind) to listen when parent process exits, but that's feels backward.
Sure, will do |
|
ACK 451b96f. Manual testing with and without --failfast. Failing an assert with --failfast, passes through reviewed code. When fails, exits gracefully stopping of processes. Log
$ test/functional/test_runner.py --failfast
rpc_decodescript.py rpc_deprecated.py rpc_deriveaddresses.py rpc_dumptxoutset.py
OK stdout: stderr: Early exiting after test failure TEST | STATUS | DURATION rpc_decodescript.py | ✖ Failed | 1 s ALL | ✖ Failed | 1 s (accumulated) dangling processes <!-- Print I write to know if it goes through the added part in test_runner.py ---> Failing an assert without --failfast, dont passes through reviewed code and all tests are run. Log$ test/functional/test_runner.py rpc_decodescript.py rpc_deprecated.py rpc_deriveaddresses.py rpc_dumptxoutset.py Temporary test directory at /tmp/test_runner_₿_🏃_20210616_005950 Running Unit Tests for Test Framework Modules .......... ---------------------------------------------------------------------- Ran 10 tests in 0.680s OK stdout: stderr: Remaining jobs: [rpc_deprecated.py, rpc_deriveaddresses.py, rpc_dumptxoutset.py] TEST | STATUS | DURATION rpc_deprecated.py | ✓ Passed | 1 s ALL | ✖ Failed | 6 s (accumulated) |
|
Example failure is here and looks good: https://cirrus-ci.com/build/5678106947092480 Commit can be removed again. |
0a91c85 to
451b96f
Compare
|
Done |
…sses when using `--failfast` 451b96f test: kill process group to avoid dangling processes (S3RK) Pull request description: This is an alternative to bitcoin#19281 This PR fixes a problem when after test failure with `--failfast` option there could be dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures. If there are any dangling nodes left at the end of the test run we kill the whole process group. Pros: the operations is immediate and won't lead to CI timeout Cons: the test_runner process is also killed and exit code is 137 Example output: ``` ... Early exiting after test failure TEST | STATUS | DURATION rpc_decodescript.py | ✓ Passed | 2 s rpc_deprecated.py | ✓ Passed | 2 s rpc_deriveaddresses.py | ✓ Passed | 2 s rpc_dumptxoutset.py | ✖ Failed | 2 s ALL | ✖ Failed | 8 s (accumulated) Runtime: 4 s Killed: 9 > echo $? 137 ``` ACKs for top commit: MarcoFalke: review ACK 451b96f aitorjs: ACK 451b96f. Manual testing with and without **--failfast**. Tree-SHA512: 87e510a1411b9e7571e63cf7ffc8b9a8935daf9112ffc0f069d6c406ba87743ec439808181f7e13cb97bb200fad528589786c47f0b43cf3a2ef0d06a23cb86dd
This is an alternative to #19281
This PR fixes a problem when after test failure with
--failfastoption there could be dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures.If there are any dangling nodes left at the end of the test run we kill the whole process group.
Pros: the operations is immediate and won't lead to CI timeout
Cons: the test_runner process is also killed and exit code is 137
Example output: