Skip to content

Conversation

@stratospher
Copy link
Contributor

@stratospher stratospher commented Dec 6, 2023

An address is placed in a [bucket,position] in the addrman table (new table or tried table) using the addpeeraddress RPC. This [bucket,position] is calculated using nKey(and other metrics) for the addrman which is chosen randomly during every run.

Supposing there are 2 addresses to be placed in an addrman table. During every test run, a different [bucket,position] would be calculated for each address.These calculated [bucket,position] could even be the same for the 2 addresses in some test runs and result in collisions in the addrman. We wouldn't be able to predict when the collisions are going to happen because we can't predict the nKey value which is chosen at random. This can cause flaky tests.

Because of these non deterministic collisions, we are limited in what we can do to test addrman functionality. Currently in our tests don't add a second address to prevent these collisions from happening - we only keep 1 address in the new table and 1 address in the tried table. See #26988 (comment), #23084, #22831(comment).

This PR lets us create a deterministic addrman with fixed nKey so that we can know the [bucket,position] collisions beforehand, safely add more addresses in an addrman table and write more extensive tests.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 6, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK 0xB10C, amitiuttarwar, mzumsande, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28998 (rpc: addpeeraddress tried return error on failure by 0xB10C)
  • #28802 (ArgsManager: support subcommand-specific options by ajtowns)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/init.cpp Outdated
argsman.AddArg("-limitdescendantcount=<n>", strprintf("Do not accept transactions if any ancestor would have <n> or more in-mempool descendants (default: %u)", DEFAULT_DESCENDANT_LIMIT), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
argsman.AddArg("-limitdescendantsize=<n>", strprintf("Do not accept transactions if any ancestor would have more than <n> kilobytes of in-mempool descendants (default: %u).", DEFAULT_DESCENDANT_SIZE_LIMIT_KVB), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
argsman.AddArg("-addrmantest", "Allows to test address relay on localhost", ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
argsman.AddArg("-addrtest=<option>", strprintf("Test general address related functionalities. Only used in tests. When -addrtest=relay is specified, test address relay on localhost. When -addrtest=addrman is specified, use a deterministic addrman. (default: %s)", "relay"), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can be generalized to pass test-only options (along with a helper HasTestOption, similar to IsDeprecatedRPCEnabled?

The user-facing debug-help output of this seems the wrong place to document test-only options, and the source code seems sufficient to be self-documenting.

Suggested change
argsman.AddArg("-addrtest=<option>", strprintf("Test general address related functionalities. Only used in tests. When -addrtest=relay is specified, test address relay on localhost. When -addrtest=addrman is specified, use a deterministic addrman. (default: %s)", "relay"), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
argsman.AddArg("-test=<option>", "Pass a test-only option", ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, (default: %s)", "relay" is wrong, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can be generalized to pass test-only options (along with a helper HasTestOption, similar to IsDeprecatedRPCEnabled?

what would a function like HasTestOption do though? we could introduce it here to check if we're using a known test option("relay", "addrman"). even if we use unknown test options, we don't need to throw an error right.

In any case, (default: %s)", "relay" is wrong, no?

done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would a function like HasTestOption do though? we could introduce it here to check if we're using a known test option("relay", "addrman"). even if we use unknown test options, we don't need to throw an error right.

It would just be a one-line function to hold the code you already wrote in two places:

return any_of(args.GetArgs("test"), test_option);

Copy link
Contributor

@mzumsande mzumsande Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to move other existing test-only options (-acceptstalefeeestimates or -fastprune come to mind) under this command?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. (Not here, obviously, but in a follow-up possibly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would just be a one-line function to hold the code you already wrote in two places:

oh got it now.

I wonder if this can be generalized to pass test-only options (along with a helper HasTestOption, similar to IsDeprecatedRPCEnabled?

good idea - liked how we can make sure users don't accidentally use these test-only options in their config file.
done.

@stratospher stratospher force-pushed the deterministic-addrman branch from 964e1a8 to 0545981 Compare December 7, 2023 04:32
@stratospher
Copy link
Contributor Author

thanks @maflcko, @0xB10C, @brunoerg. I've updated the PR to address your comments. Open to more thoughts/suggestions in #29007 (comment).

Comment on lines 457 to 477
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about how to clear the addrman tables in a functional test. I think it would be ideal if we could avoid the addpeeraddress test leaking into getaddrmaninfo and getrawaddrman tests.

This also becomes relevant in the context of #28998 where we might want to produce a tried table collision on purpose to have coverage of the addpeeraddress RPC failure path. The easiest way of producing a collision is by adding more and more addresses to the tried table (as we do in the unit test). This would leave us with quite a few additional addresses already in the tried table.

I currently see two possibilities for clearing the addrman tables between tests:

  1. shutdown the node, delete peers.dat, and restart
  2. have a third node for this test that's only used for getrawaddrman.

Copy link
Contributor

@0xB10C 0xB10C Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're doing option 1. in the feature_addrman.py functional test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be ideal if we could avoid the addpeeraddress test leaking into getaddrmaninfo and getrawaddrman tests.

it would be cleaner to not need to know the existing context of addrman when dealing with getaddrmaninfo/getrawaddrman tests. both the options sound good to me!

The easiest way of producing a collision is by adding more and more addresses to the tried table (as we do in the unit test). This would leave us with quite a few additional addresses already in the tried table.

these collisions would only affect future tests added after getaddrmaninfo/getrawaddrman tests though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way of producing a collision is by adding more and more addresses to the tried table (as we do in the unit test). This would leave us with quite a few additional addresses already in the tried table.

these collisions would only affect future tests added after getaddrmaninfo/getrawaddrman tests though.

We'd need to clean up after we found a collision.

@DrahtBot DrahtBot mentioned this pull request Dec 7, 2023
8 tasks
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Needs rebase due to a silent conflict with #27581 (feature_asmap.py).

@stratospher stratospher force-pushed the deterministic-addrman branch 3 times, most recently from 5481208 to d3c2418 Compare December 16, 2023 07:03
Copy link
Contributor

@amitiuttarwar amitiuttarwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approach ACK. yay for finally having addrman determinism in the functional tests 🙌🏽
couple thoughts to consider-

1. clearer subtest addrman setup in rpc_net.py

@0xB10C

I think it would be ideal if we could avoid the addpeeraddress test leaking into getaddrmaninfo and getrawaddrman tests.

agreed. right now, the 3 subtests (test_addpeeraddress, test_getaddrmaninfo & test_getrawaddrman) have a lot of implicit dependencies, making the overall test harder to understand & develop on.

here's a suggestion with the intention of making the addrman setup more clear in each subtest:

  • introduce a helper, eg. seed_addrman which uses addpeeraddress to add addresses from all the different networks to addrman.
  • in the beginning of test_getaddrmaninfo & test_getrawaddrman, get a clean addrman (by using either method mentioned by @0xB10C) and call seed_addrman to populate it in an expected way. then each subtest would check the results with the relevant RPC

2. remove addrmantest / test=localaddr

I am not convinced that this test-only behavior is worth maintaining. the only thing that p2p_node_network_limited.py is testing is that 1. an addr message is received & 2. the expected services are advertised. # 1 is forced to true via the test-only codepath in GetLocalAddrForPeer, and the value of # 2 is nominal due to how easily the test-only code could diverge from the mainnet logic.

My vote is to remove this functionality & related code. Curious to hear what you think @stratospher, as well as the opinions of other reviewers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these init params are added here, but are only relevant until a couple subtests later in test_getaddrmaninfo. having unrelated init params can lead to unexpected behaviors, like in this recent example. implicit dependencies between subtests can also make future development confusing, such as updating setup in an earlier test unexpectedly causing failures in seemingly unrelated tests.

leaving suggestions for improving the structure between subtests in the overall review comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense! added an option in restart_node() to restart the node with an empty addrman which could potentially be useful in other tests. addresses added in addpeeraddress test don't leak into addrman tests now.

src/init.cpp Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach of using -test to group test-only options

two suggestions for improvement:

  • include the options in the help text (eg. how -onlynet or -debug handles)
  • throw some sort of error (or at least warning?) if an invalid option is passed through

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, i've included both improvements!

@stratospher stratospher force-pushed the deterministic-addrman branch 2 times, most recently from 20a8427 to 1b49149 Compare December 21, 2023 06:16
@stratospher
Copy link
Contributor Author

stratospher commented Dec 21, 2023

thanks for the thoughtful feedback @amitiuttarwar, @0xB10C! I've updated the PR to address your comments.

Main changes were: (git diff)

  1. adding an option to restart the node with an empty addrman (method 1 suggested here test: create deterministic addrman in the functional tests #29007 (comment)) to prevent addpeeraddress test leaking into getaddrmaninfo and getrawaddrman tests.
  2. listing options in -help-debug for -test=<option>

I am not convinced that this test-only behavior is worth maintaining. the only thing that p2p_node_network_limited.py is testing is that 1. an addr message is received & 2. the expected services are advertised. # 1 is forced to true via the test-only codepath in GetLocalAddrForPeer, and the value of # 2 is nominal due to how easily the test-only code could diverge from the mainnet logic.

My vote is to remove this functionality & related code. Curious to hear what you think @stratospher, as well as the opinions of other reviewers.

yes, i agree. don't think it's useful to have a command line arg which is used in only 1 line in 1 test. we would want to know if the address with expected services are really advertised in real world logic and not in hard coded test path logic. looks like the option was introduced in this commit.

@stratospher stratospher force-pushed the deterministic-addrman branch 2 times, most recently from b7ec1fc to 291a1a3 Compare December 22, 2023 06:25
@stratospher
Copy link
Contributor Author

Updated the PR to remove -addrmantest (git diff) in 5d27993.

@stratospher stratospher force-pushed the deterministic-addrman branch from 291a1a3 to d8252df Compare January 2, 2024 08:20
@amitiuttarwar
Copy link
Contributor

ACK 2cc8ca1

reviewed the code & tested the init option

@DrahtBot DrahtBot removed the request for review from amitiuttarwar February 8, 2024 06:57
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 2cc8ca1


if (args.IsArgSet("-test")) {
if (chainparams.GetChainType() != ChainType::REGTEST) {
return InitError(Untranslated("-test=<option> should only be used in functional tests"));
Copy link
Contributor

@mzumsande mzumsande Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had the same suggestion independently (we have functional tests that don't use regtest but testnet)

auto addrman{std::make_unique<AddrMan>(netgroupman, /*deterministic=*/false, /*consistency_check_ratio=*/check_addrman)};
bool deterministic = HasTestOption(args, "addrman"); // use a deterministic addrman only for tests

auto addrman{std::make_unique<AddrMan>(netgroupman, /*deterministic=*/deterministic, /*consistency_check_ratio=*/check_addrman)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't know if we have recomendations about this, but I find /*deterministic=*/deterministic a bit silly, might as well drop the named argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

included in #29636

@DrahtBot DrahtBot requested review from mzumsande and removed request for mzumsande February 13, 2024 18:37
@maflcko maflcko added this to the 28.0 milestone Feb 22, 2024
@achow101
Copy link
Member

ACK 2cc8ca1

Comment on lines +48 to +55
node.addpeeraddress(address="1.2.3.4", tried=True, port=8333)
node.addpeeraddress(address="2.0.0.0", port=8333)
node.addpeeraddress(address="1233:3432:2434:2343:3234:2345:6546:4534", tried=True, port=8333)
node.addpeeraddress(address="2803:0:1234:abcd::1", port=45324)
node.addpeeraddress(address="fc00:1:2:3:4:5:6:7", port=8333)
node.addpeeraddress(address="pg6mmjiyjmcrsslvykfwnntlaru7p5svn6y2ymmju6nubxndf4pscryd.onion", tried=True, port=8333)
node.addpeeraddress(address="nrfj6inpyf73gpkyool35hcmne5zwfmse3jl3aw23vk7chdemalyaqad.onion", port=45324, tried=True)
node.addpeeraddress(address="c4gfnttsuwqomiygupdqqqyy5y5emnk5c73hrfvatri67prd7vyq.b32.i2p", port=8333)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should check that these addresses are successfully added to the addrman. They work with the current deterministic addrman, but this might break silently when the addrman bucket/position calculation changes and these cause a collision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

included in #29636

@0xB10C
Copy link
Contributor

0xB10C commented Mar 12, 2024

Maybe this reduces someone's time spent debugging: I noticed in #28998 that restarting a node with -test=addrman doesn't make an existing addrman deterministic. You also need to set clear_addrman=True to start off with fresh and deterministic addrman.

self.restart_node(x, ["-test=addrman"], clear_addrman=True)

@0xB10C 0xB10C mentioned this pull request Mar 12, 2024
0xB10C added a commit to 0xB10C/bitcoin that referenced this pull request Mar 23, 2024
Just having deterministic is enough. See bitcoin#29007 (comment)
0xB10C added a commit to 0xB10C/bitcoin that referenced this pull request Mar 23, 2024
fanquake added a commit that referenced this pull request Mar 25, 2024
9a44a20 init: clarify -test error (0xb10c)
3047c3e addrman: drop /*deterministic=*/ comment (0xb10c)
89b84ea test: check that addrman seeding is successful (0xb10c)

Pull request description:

  A few, small follow-ups to #29007. See commit messages for details.

ACKs for top commit:
  maflcko:
    lgtm ACK 9a44a20
  stratospher:
    tested ACK 9a44a20.
  mzumsande:
    Code Review ACK 9a44a20

Tree-SHA512: 987245e035da08fa7fe541a1dc3b7c2d90f703a6f9813875048d286335c63ffa5201db702a3f368087c00fa02c3fdafb06cf54dc7a92922749a94588b1500e98
janus pushed a commit to BitgesellOfficial/bitgesell that referenced this pull request Apr 6, 2024
janus pushed a commit to BitgesellOfficial/bitgesell that referenced this pull request Apr 6, 2024
@stratospher stratospher deleted the deterministic-addrman branch July 17, 2024 06:39
knst pushed a commit to knst/dash that referenced this pull request Aug 29, 2024
…e functional tests

BACKPORT NOTICE
It includes only this commit: be25ac3

[init] Remove -addrmantest command line arg

-addrmantest is only used in `p2p_node_network_limited.py` test to
test if the node self-advertises a hard-coded local address
(which wouldn't be advertised in the tests because it's unroutable
without the test-only code path) to check pruning-related services
are correct in that addr.

Remove -addrmantest because the self advertisement happens because
of hard coded test path logic, and expected services are nominal
due to how easily the test-only code could diverge from mainnet
logic. It's also being used only in 1 test.
PastaPastaPasta added a commit to dashpay/dash that referenced this pull request Aug 30, 2024
…29007 - to improve CI experience

2eadcf2 partial Merge bitcoin#29007: test: create deterministic addrman in the functional tests (stratospher)
f95ca4e fix: unify with bitcoin: removed requirement of txindex=0 (Konstantin Akimov)
bf46e7a partial Merge bitcoin#28799: wallet: cache descriptor ID to avoid repeated descriptor string creation (Andrew Chow)
05e5966 partial Merge bitcoin#27920: wallet: bugfix, always use apostrophe for spkm descriptor ID (furszy)

Pull request description:

  ## Issue being fixed or feature implemented
  Lately our CI for tsan is flapping many functional tests and take long times.

  This PR has several important changes backported from the latest bitcoin's version to improve CI experience

  ## What was done?
  This PR has several backports that improved CI experience drastically.

  **Firstly, it aims to fix flapping test p2p_node_network_limited.py**

  For example: https://gitlab.com/dashpay/dash/-/jobs/7692635307

  <details>

  <summary>p2p_node_network_limited.py                        | ✖ Failed  | 28 s</summary>

  ```
   test  2024-08-29T02:50:53.929000Z TestFramework (ERROR): Assertion failed
  Traceback (most recent call last):
  File "/builds/dashpay/dash/build-ci/dashcore-linux64_tsan/test/functional/test_framework/test_framework.py", line 158, in main
    self.run_test()
  File "/builds/dashpay/dash/build-ci/dashcore-linux64_tsan/test/functional/p2p_node_network_limited.py", line 79, in run_test
    self.nodes[0].disconnect_p2ps()
  File "/builds/dashpay/dash/build-ci/dashcore-linux64_tsan/test/functional/test_framework/test_node.py", line 611, in disconnect_p2ps
    wait_until_helper(check_peers, timeout=5)
  File "/builds/dashpay/dash/build-ci/dashcore-linux64_tsan/test/functional/test_framework/util.py", line 262, in wait_until_helper
    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
    AssertionError: Predicate ''''
         def check_peers():
             for p in self.getpeerinfo():
                 for p2p in self.p2ps:
                     if p['subver'] == p2p.strSubVer:
                         return False
             return True
  ''' not true after 5.0 seconds
  ```
  </details>

  **Secondly, it improves performance of Descriptor wallets significantly for case of `tsan` CI**. It is tiny improvement for Release build and local runs, but some fucnctional tests run as fast as twice:

  https://gitlab.com/dashpay/dash/-/jobs/7694458953
  https://gitlab.com/dashpay/dash/-/jobs/7665132625

  ```
  wallet_create_tx.py --descriptors                  | ✓ Passed  | 236 s <-- new version
  wallet_create_tx.py --legacy-wallet                | ✓ Passed  | 108 s
  wallet_basic.py --descriptors                      | ✓ Passed  | 135 s <---- new version
  wallet_basic.py --legacy-wallet                    | ✓ Passed  | 97 s

  wallet_create_tx.py --descriptors                  | ✓ Passed  | 456 s <-- old version
  wallet_create_tx.py --legacy-wallet                | ✓ Passed  | 98 s
  wallet_basic.py --descriptors                      | ✓ Passed  | 189 s <--- old version
  wallet_basic.py --legacy-wallet                    | ✓ Passed  | 131 s
  ```
  See performance investigation here: #6226

  ## How Has This Been Tested?
  Run unit/functional tests

  ## Breaking Changes
  N/A

  ## Checklist:
  - [x] I have performed a self-review of my own code
  - [ ] I have commented my code, particularly in hard-to-understand areas
  - [ ] I have added or updated relevant unit/integration/functional/e2e tests
  - [ ] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone

ACKs for top commit:
  PastaPastaPasta:
    utACK 2eadcf2
  UdjinM6:
    utACK 2eadcf2

Tree-SHA512: 127fbaa65c160aa95e2145a6b40d3811f7c42e36fbee9ce98a9ac021abd9cbe6edc7791870b331a54855ba891e3804885db7936ef212647b693f50f79a60d232
@bitcoin bitcoin locked and limited conversation to collaborators Jul 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants