Skip to content

Conversation

@jonatack
Copy link
Member

@jonatack jonatack commented Apr 1, 2020

As a follow-up to PR #18247, which added sync_with_ping by default in add_p2p_connection(), this PR allows optionally passing an argument to add_p2p_connection to avoid running sync_with_ping for partial connections.

The second commit passes that argument in cases where add_p2p_connection is followed closely by a call to sync_with_ping or send_with_ping, or where the test integrity appears better served with the previous behavior of opening a partial connection.

As this PR maintains previous behavior WRT a few tests, there should be no downside here.

@laanwj laanwj added the Tests label Apr 1, 2020
@jonatack jonatack force-pushed the allow-peer-conn-sans-sync branch from db53a10 to 2f6705c Compare April 1, 2020 15:41
@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 1, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

for cases where the add_p2p_connection is already followed with a call to
sync_with_ping or send_with_ping, or where the test integrity appears better
served by not performing the sync_with_ping, which was added as a default in
fa90647.
in cases where the add_p2p_connection is already followed with a call to
sync_with_ping or send_with_ping, or where the test integrity appears better
served by not performing the sync_with_ping, which was added as a default in
fa90647.
@jonatack jonatack force-pushed the allow-peer-conn-sans-sync branch from 2a47829 to 522bdf6 Compare April 21, 2020 13:36
@jonatack
Copy link
Member Author

Rebased.

Copy link
Contributor

@robot-visions robot-visions left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this! One question before I finish reviewing: am I correct in understanding that you looked through every use of add_p2p_connection and individually decided whether the special sync_with_ping=False behavior is needed?

@maflcko
Copy link
Member

maflcko commented Apr 23, 2020

Slightly tend to NACK. This is making the tests harder to read (a reader will ask themselves why some calls have the ping disabled explicitly and the call in the very next line does not have it disabled). Also, it is making the tests harder to write or change, because forgetting to update one of the disabled pings to be enabled, when needed, can easily be forgotten. There is no infrastructure in place to catch such mistakes (other than review).

I think we should make tests easy to read and easy to write, not the opposite.

@jonatack
Copy link
Member Author

Thanks for the feeback. Below I'll try to summarize the different aspects as I (correctly or not) see them.

Full connections by default instead of partial connections

add_p2p_connection was added in March 2017 with 5e5725c, which means that for the past 3 years the functional test suite using this method was running without forcing full p2p connections (which were made by calling sync_with_ping). If I understand correctly, the motivation for fa90647 was to hopefully reduce spurious test failures in the CI. Nevertheless, ISTM we would ideally prefer to avoid sync_with_ping everywhere by default and could verify regularly if it can be reverted without increasing the number of false test negatives.

Allowing the previous default of partial connections

It seems we still need a way for tests to make partial connections. One reason is speed/performance: Waste not; gains may appear minor, but the functional suite will continue growing and its run time continue to increase, long-term discouraging people from running it and increasing our dependance on centralized third parties. A more immediately interesting reason is, are there tests that might now be false positives, because their partial connection was changed to a full one to reduce false negatives?

Downside of allowing partial connections

For 3 years, connections were partial by default and required an additional sync_with_ping to become full. Now, connections are full by default and would only require an additional argument be partial. The complexity is roughly equivalent with only the default state changed, but inevitably we need to be able to test various kinds of connections.

In the worst case, a test that was using a partial connection might be modified to not longer need it without the author and reviewers noticing (if the changed test required a full connection, it would fail and be noticed/updated). In that worse case, the connection would function the way the functional test suite did for the previous 3 years. Only if the new test fails would it become an issue needing to be addressed (and the fix is trivial).

Implementation of connection types

I like @robot-visions' suggestion combining the states into one argument. Yes, I looked for cases where a partial connection is a better test or more practical e.g. followed by a sync already. There are likely more of them.

@jonatack
Copy link
Member Author

Question: what changed since March 2017? An increasing number of spurious (maddening, to be sure) CI failures?

@maflcko
Copy link
Member

maflcko commented Apr 24, 2020

Yes, we have a lot of spurious ci failures. No one reports them as bugs or even tries to fix them (except me). Everyone blindly hits the restart button on travis and it is up to me to beat them and save the logs locally. We have bugs reported by travis that crash the whole node, when a miner submits a block: #18742

I don't think making a handful of tests faster by a few microseconds is going to help in any notiecable way. And with the risk of making the test suite more fragile and missing actual failures, I don't think the unspecified gain [1] is worth the risk.

[1] Performance improvements need benchmarks to be useful

@maflcko
Copy link
Member

maflcko commented Apr 24, 2020

Btw, calling it "partial connection" implies that this is a different kind of connection or something that the caller can exploit for a test in some way. This is not true. The verack is already in flight and it can't be erased from the wire or changed in any way. If the caller wanted to exploit that intermittent state for a test, the result would be racy. Races are the number one reason that our tests fail.

If a test needs the verack to be not send (and requires a "partial connection" or whatever the right term for that is), it can use CNodeNoVerackIdle.

@jonatack
Copy link
Member Author

Thanks for the feedback. I empathize with the burden of the CI failures you're dealing with and the need for robust tests and benchmarks. Robust > speed. Will think about this more.

@jonatack jonatack closed this Apr 24, 2020
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants