qa: Fix intermittent "Unable to connect to bitcoind" errors on Windows #28509

hebasto · 2023-09-19T20:56:48Z

During my investigation of #28411 and other similar functional test failures on Windows in CI, I found out that

bitcoin/test/functional/test_framework/test_node.py

Line 223 in abe4fed

    
           self.process = subprocess.Popen(self.args + extra_args, env=subp_env, stdout=stdout, stderr=stderr, cwd=cwd, **kwargs)

sometimes fails for unknown to me reasons. By "fails", I mean that a child process does not make any progress.

This PR ensures a child process's progress by checking a created PID file shortly. If the check fails, another two attempts are following.

Although this PR fixes tests on Windows, the new logic is platform-agnostic and increases test robustness.

In several dozens of runs in my personal repo GHA, the only intermittent failure still happens -- #28491.

Closes #28411.

DrahtBot · 2023-09-19T20:56:51Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28392 (test: Use pathlib over os path by ns-xvrn)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko · 2023-09-19T21:16:55Z

May be easier to just bump the python version from 3.9 to 3.12 to fix the bug?

hebasto · 2023-09-20T09:50:02Z

The CI failure is #28491 and unrelated to this PR.

hebasto · 2023-09-20T13:58:25Z

... just bump the python version from 3.9 to 3.12...

From Python 3.12 Release Schedule:

Expected:

3.12.0 final: Monday, 2023-10-02

The currently available Python versions in the Windows 2022 image:

3.7.9

3.8.10

3.9.13

3.10.11

3.11.5

maflcko · 2023-09-20T14:18:45Z

Which one are we using right now?

hebasto · 2023-09-20T14:19:43Z

Which one are we using right now?

On Windows, it is 3.11.5.

maflcko · 2023-09-20T14:49:48Z

Ok, so the issue is probably not due to an too-old python version.

fanquake · 2023-09-24T16:22:11Z

Concept ~0. A bunch of extra code in the test-framework, to fix a not-yet-identified, Windows only issue.

the new logic is platform-agnostic and increases test robustness.

Can you elaborate on how this increases robustness for non-Windows platforms, if they are already working?

hebasto · 2023-09-24T16:48:03Z

Concept ~0. A bunch of extra code in the test-framework, to fix a not-yet-identified, Windows only issue.

We already have an entire directory with code that serves similar purposes in our CI.
We already have a bunch of platform-specific code in the test-framework.
The issue has been identified (please refer to the PR description), but its cause has not yet been determined.
Of course, it would be great if someone identifies it. And then this workaround can be dropped.

Can you elaborate on how this increases robustness for non-Windows platforms, if they are already working?

If some similar issues will happen for non-Windows platform in the future, they won't break the tests.

fanquake · 2023-09-24T16:51:10Z

If some similar issues will happen for non-Windows platform in the future, they won't break the tests.

You mean the issues will just be hidden / less-likely to be identified & debugged?

hebasto · 2023-09-24T16:55:55Z

@fanquake

If some similar issues will happen for non-Windows platform in the future, they won't break the tests.

You mean the issues will just be hidden / less-likely to be identified & debugged?

This PR adds additional logging and exceptions.

What do you suggest?

fanquake · 2023-09-24T17:02:49Z

What do you suggest?

I would suggest we figure out why Python doesn't work on Windows, or at least, doesn't work when run in the GitHub CI, and fix it in a targeted way (while reporting the issue upstream), with the intention to drop the workaround as soon as a newer version of Python is available, rather than inject all this new code, into the test framework, where it affects all platforms.

hebasto · 2023-09-25T18:10:21Z

I would suggest we figure out why Python doesn't work on Windows, or at least, doesn't work when run in the GitHub CI...

I started to think that the issue is specific to GHA CI as I cannot reproduce it locally.

maflcko · 2023-09-29T12:45:39Z

Could it make sense to disable the functional tests on Windows for pull requests and only run them on master?

This means that issues will be caught at a later stage only, but I'd suspect they are easy to fixup post-merge.

Overall this may be less work than having someone re-run the CI on all affected pull request or having people ignore the Windows CI anyway.

fanquake · 2023-10-02T09:14:30Z

Yea, I think this might be the right thing to do (for now). Persistent random red CI is pointless, and confusing for contributors. It's a shame that Windows Python doesn't seem to work on GitHub, but we also aren't going to make all the changes here to work around that.

…windows in master aba4a58 ci: Only run functional tests on windows in master (Fabian Jahr) Pull request description: This idea was discussed [here](bitcoin/bitcoin#28509 (comment)). ACKs for top commit: hebasto: ACK aba4a58 Tree-SHA512: 89fd6352b585bae3538d5350b0404c216a8225fe356d408c1ebe3394e7b9a190d65639f4eef310056e020909928d7a1f2de25585c97d2ac087d1a9f72af281eb

…in master aba4a58 ci: Only run functional tests on windows in master (Fabian Jahr) Pull request description: This idea was discussed [here](bitcoin#28509 (comment)). ACKs for top commit: hebasto: ACK aba4a58 Tree-SHA512: 89fd6352b585bae3538d5350b0404c216a8225fe356d408c1ebe3394e7b9a190d65639f4eef310056e020909928d7a1f2de25585c97d2ac087d1a9f72af281eb

DrahtBot added the Tests label Sep 19, 2023

hebasto force-pushed the 230919-subprocess branch from 5c9eab5 to d8dd6cc Compare September 19, 2023 21:05

DrahtBot added the CI failed label Sep 19, 2023

DrahtBot mentioned this pull request Sep 20, 2023

test: Use pathlib over os path #28392

Merged

hebasto force-pushed the 230919-subprocess branch from d8dd6cc to 3c018fc Compare September 20, 2023 07:36

hebasto marked this pull request as draft September 20, 2023 08:03

hebasto force-pushed the 230919-subprocess branch from 3c018fc to 89cc937 Compare September 20, 2023 08:16

hebasto marked this pull request as ready for review September 20, 2023 09:49

qa: Ensure subprocess.Popen(bitcoind) succeeds

f6c3419

hebasto force-pushed the 230919-subprocess branch from 89cc937 to f6c3419 Compare September 20, 2023 11:30

DrahtBot removed the CI failed label Sep 20, 2023

hebasto mentioned this pull request Sep 20, 2023

Fix virtual size limit enforcement in transaction package context #28471

Merged

hebasto closed this Oct 2, 2023

fjahr mentioned this pull request Oct 3, 2023

ci: Only run functional tests on native windows in master #28567

Merged

jlopp mentioned this pull request Oct 4, 2023

bugfix: throw an error if an invalid parameter is passed to getnetworkhashps RPC #28554

Merged

amitiuttarwar mentioned this pull request Oct 18, 2023

net: improve max-connection limits code #28464

Merged

hebasto mentioned this pull request Aug 27, 2024

ci: ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it #30390

Closed

bitcoin locked and limited conversation to collaborators Oct 1, 2024

qa: Fix intermittent "Unable to connect to bitcoind" errors on Windows #28509

qa: Fix intermittent "Unable to connect to bitcoind" errors on Windows #28509

Uh oh!

Conversation

hebasto commented Sep 19, 2023

Uh oh!

DrahtBot commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews

Conflicts

Uh oh!

maflcko commented Sep 19, 2023

Uh oh!

hebasto commented Sep 20, 2023

Uh oh!

hebasto commented Sep 20, 2023

Uh oh!

maflcko commented Sep 20, 2023

Uh oh!

hebasto commented Sep 20, 2023

Uh oh!

maflcko commented Sep 20, 2023

Uh oh!

fanquake commented Sep 24, 2023

Uh oh!

hebasto commented Sep 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fanquake commented Sep 24, 2023

Uh oh!

hebasto commented Sep 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fanquake commented Sep 24, 2023

Uh oh!

hebasto commented Sep 25, 2023

Uh oh!

maflcko commented Sep 29, 2023

Uh oh!

fanquake commented Oct 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DrahtBot commented Sep 19, 2023 •

edited

Loading

hebasto commented Sep 24, 2023 •

edited

Loading

hebasto commented Sep 24, 2023 •

edited

Loading