-
Notifications
You must be signed in to change notification settings - Fork 38.7k
p2p: track AddrMan totals by network and table, improve precision of adding fixed seeds #26847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
approach ACK. relatively simple patch to ameliorate the issue of address discovery when switching networks |
src/addrman.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this dependency, should we drop nNew and nTried altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's not so straightforward - nNew and nTried are part of the serialization after all. Also, I don't like that then we'd need to build the sum over all networks every time we needed nNew or nTried (as the check code does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't like that then we'd need to build the sum over all networks every time we needed nNew or nTried (as the check code does).
Well, I don't think something being inside the Check is a good justification — it's the other way around, Check is needed to make sure those values are sane. Furthermore, the sum over all networks is 5 addition operations, right?
However, I agree that dropping these values might be not trivial due to serialization. Not sure I'll find time to prototype that, but maybe someone will :) Otherwise feel free to close this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, see this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the check such that it no longer compares with nNew and nTried but does a recount, see below for more context (of course the relation is still there...).
brunoerg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
85dc0c9 to
18acfdf
Compare
src/addrman.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good if the sanity check here actually recomputed the totals per-network (by iterating over the actual addrman entries).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to actually recompute the totals - while doing so I noticed that it is possible that it's possible that there are "empty" entries (0 new and 0 tried) in m_map_counts if an addr of some network was first added and later removed again from. That's not a problem in my opinion, but it means that the size of the map is not necessarily reliable. The changed check should work with this, potentially also adding an entry empty to local_counts.
|
Concept/approach ACK |
18acfdf to
c75ca24
Compare
src/addrman.cpp
Outdated
| return -20; | ||
| } | ||
| for (const auto& [net, count] : m_map_counts) { | ||
| if (local_counts[net].first != count.first || local_counts[net].second != count.second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably too crazy, but I was thinking what if these values overflow (from ++) and are not equal anymore due to UB? I'm actually not sure what's our attitude towards such risks across the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It definitely shouldn't be possible and would mean that there is a serious bug in addrman, but I think that within the Check() code, we definitely want to abort. After all, Check() doesn't normally run in production (DEFAULT_ADDRMAN_CONSISTENCY_CHECKS == 0) and needs to be activated by the user.
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
src/net.cpp
Outdated
| std::vector<Network> CConnman::GetRelevantFixedSeedNetworks() const | ||
| { | ||
| std::vector<Network> networks{}; | ||
| for (int n = 0; n < NET_MAX; n++) { | ||
| enum Network net = (enum Network)n; | ||
| if (net == NET_UNROUTABLE || net == NET_INTERNAL) continue; | ||
| if (IsReachable(net) && addrman.size(net, std::nullopt) == 0) { | ||
| networks.push_back(net); | ||
| } | ||
| } | ||
| return networks; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not need to be a method of CConnman because it does not access any of its members (other than addrman). Would it be more appropriate to have it as AddrMan method? Then the public interface of AddrMan need not be changed by this PR. Edit: I mean no need to add this AddrMan method size(std::optional<Network> net, std::optional<bool> in_new).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this belongs into AddrMan because of the IsReachable - I think the information which networks are reachable belongs into net, and possibly at some later point into CConnman (I know vfLimited is currently a global, but this might change). I don't think it makes sense that AddrMan should know or deal with this information, so I think I prefer leaving this out of AddrMan and reducing the change of the public interface by combining the size functions as suggested by Amiti below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a standalone function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be ok with that, but then we'd have to pass it addrman, which is a member of CConnman, so I'm not sure why that is better.
|
code looks good. I went through all the addrman touch points to the new & tried tables to double check nothing was missed. tried my best to think through edge cases of what could go wrong and don't have anything :) left a bunch of review comments that are all style related, no blockers. another style related suggestion: what do you think about consolidating the two |
| if (!net.has_value()) { | ||
| if (!in_new.has_value()) { | ||
| return nNew + nTried; | ||
| } else { | ||
| return *in_new ? nNew : nTried; | ||
| } | ||
| } | ||
| if (auto it = m_map_counts.find(*net); it != m_map_counts.end()) { | ||
| auto entry = it->second; | ||
| if (!in_new.has_value()) { | ||
| return entry.first + entry.second; | ||
| } else { | ||
| return *in_new ? entry.first : entry.second; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style suggestion to consider. mainly, reversing the logic blocks in the if statements to reduce the ! indirections of nested logic.
code
| if (!net.has_value()) { | |
| if (!in_new.has_value()) { | |
| return nNew + nTried; | |
| } else { | |
| return *in_new ? nNew : nTried; | |
| } | |
| } | |
| if (auto it = m_map_counts.find(*net); it != m_map_counts.end()) { | |
| auto entry = it->second; | |
| if (!in_new.has_value()) { | |
| return entry.first + entry.second; | |
| } else { | |
| return *in_new ? entry.first : entry.second; | |
| } | |
| } | |
| if (!net) { | |
| if (in_new.has_value()) { | |
| return *in_new ? nNew : nTried; | |
| } else { | |
| return nNew + nTried; | |
| } | |
| } | |
| if (auto it = m_map_counts.find(*net); it != m_map_counts.end()) { | |
| auto entry = it->second; | |
| if (in_new.has_value()) { | |
| return *in_new ? entry.first : entry.second; | |
| } else { | |
| return entry.first + entry.second; | |
| } | |
| } | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also just noting that you use nNew + nTried to represent size of unique addresses in addrman. this breaks from the convention of using vRandom.size(), but makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took your suggestion wrt to the order.
Yes, these need to be identical, but I actually changed it to use the old vRandom.size() used in size() so that there isn't a change from the status quo when the two size() functions are merged as you suggest below.
Oh, well.
:) I guess void IncNew(ssize_t delta, Network net)
{
nNew += delta;
m_map_count[net].first += delta;
}Or ditch |
|
ah, just to clarify, I meant I couldn't identify anything that could go wrong in the current version of the code. ofc things can always go awry in the future, which is why code style is useful to help reduce the chances of that happening :)
agreed that the counts of the code added to [*] for eg. other than unserialization, all additions to the new table go through
this one I'm more open to. martin pointed out some difficulties in that approach here, and I personally think its nbd either way. |
84dfadc to
f93ff50
Compare
|
Thanks a lot for the reviews! I think I have addressed all comments and implemented most of the suggestions - let me know if I missed something!
I did that, and combined it with @vasild's suggestion to use upper case notation
Yes, since it's part of the serialization we can't completely do without it. Of course, we could decide to just use |
f93ff50 to
1484e19
Compare
src/addrman.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_count[net] could be called on the non-existent key, but that's fine and will return 0, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a test for this case would be great...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_count[net]could be called on the non-existent key, but that's fine and will return 0, right?
yes, that's correct - it will create a entry in local_counts with 0, and then return 0. I think that's actually already happening in some unit test I think - but I will try to add a more explicit test soon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I should have tried to break it and see whether there is a such test. Explicit is better of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to happen, we need to have an address in AddrMan for a network at one point in time, which later gets removed, so that we have an entry in m_network_counts that has only zeroes, but no entry in local_counts.
This happens in the addrman_tests/remove_invalid unit test which tampers with the serialized stream, so that an address gets added and then removed during Deserialization - if you break CheckAddrman() by adding an assert at this spot, this subtest will fail.
I think it could also happen theoretically if the last remaining address from network A is being kicked out by an address from network B due to a collision in the new tables. But this doesn't seem trivial to write a unit test for, since addresses from different networks are typically from different groups, so I'm not sure if it's easily possible to write an explicit test for this.
… addresses Previously, we'd only load fixed seeds if we'd not know any addresses at all. This change makes it possible to change -onlynet abruptly, e.g. from -onlynet=onion to -onlynet=i2p and still find peers.
Create() is only called in one spot, so this doesn't change behavior. Co-authored-by: Amiti Uttarwar <[email protected]>
The functionality of the old size() is covered by the new Size() when no arguments are specified, so this does not change behavior. Co-authored-by: Martin Zumsande <[email protected]>
1484e19 to
80f39c9
Compare
|
utACK 80f39c9 |
|
ACK 80f39c9 |
vasild
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 80f39c9
| * @param[in] in_new Select addresses only from one table (true = new, false = tried, nullopt = both) | ||
| * @return Number of unique addresses that match specified options. | ||
| */ | ||
| size_t Size(std::optional<Network> net = {}, std::optional<bool> in_new = {}) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, if you re-touch: usually std::nullopt is used instead of {} and I think it is more readable, e.g.:
Line 236 in 79e18eb
| bool Rewind(std::optional<size_type> n = std::nullopt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do if I need to re-touch for another reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in #27015
|
ACK 80f39c9 |
|
I just noticed that this can lead to intermittent failures of the bitcoin/src/test/addrman_tests.cpp Line 951 in ba3d327
Size() before this PR, and now are executed in some runs). I think it might be best to just remove this line from the test, since this is a situation where we purposefully load a corrupted peers.dat - seems a good thing that the Addrman Check fails then... I will open a PR tomorrow.
|
dc70c1e addrman: Use std::nullopt instead of {} (Martin Zumsande) 59cc66a test: Remove AddrMan unit test that fails consistency checks (Martin Zumsande) Pull request description: Two fixups for #26847: * Now that `AddrMan::Size()` performs internal consistency tests (it didn't before), we can't call it in the `load_addrman_corrupted` unit tests, where we deal with an artificially corrupted `AddrMan`. This would fail the test when using `-checkaddrman=1` (leading to spurious CI fails). Therefore remove the tests assertion, which is not particularly helpful anyway (in production we abort init when peers.dat is corrupted instead of querying AddrMan in its corrupted state). (See bitcoin/bitcoin#26847 (comment)) * Use `std::nullopt` instead of `{}` for default args (suggested in bitcoin/bitcoin#26847 (comment)) ACKs for top commit: MarcoFalke: lgtm ACK dc70c1e Tree-SHA512: dd8a988e23d71a66d3dd30560bb653c9ad17db6915abfa5f722818b0ab18921051ec9223bfbc75d967df8bcd204dfe473d680bf68e8a8e4e4998fbb91dc973c5
dc70c1e addrman: Use std::nullopt instead of {} (Martin Zumsande) 59cc66a test: Remove AddrMan unit test that fails consistency checks (Martin Zumsande) Pull request description: Two fixups for bitcoin#26847: * Now that `AddrMan::Size()` performs internal consistency tests (it didn't before), we can't call it in the `load_addrman_corrupted` unit tests, where we deal with an artificially corrupted `AddrMan`. This would fail the test when using `-checkaddrman=1` (leading to spurious CI fails). Therefore remove the tests assertion, which is not particularly helpful anyway (in production we abort init when peers.dat is corrupted instead of querying AddrMan in its corrupted state). (See bitcoin#26847 (comment)) * Use `std::nullopt` instead of `{}` for default args (suggested in bitcoin#26847 (comment)) ACKs for top commit: MarcoFalke: lgtm ACK dc70c1e Tree-SHA512: dd8a988e23d71a66d3dd30560bb653c9ad17db6915abfa5f722818b0ab18921051ec9223bfbc75d967df8bcd204dfe473d680bf68e8a8e4e4998fbb91dc973c5
…ve precision of adding fixed seeds
, bitcoin#25619, bitcoin#27214, bitcoin#26261, bitcoin#27745, bitcoin#27529, bitcoin#28341, partial bitcoin#25331 (addrman backports: part 4) e544d3c fmt: apply `clang-format-diff.py` suggestions, satisfy linter (Kittywhiskers Van Gogh) 7da74ff merge bitcoin#28341: Use HashWriter over legacy CHashWriter (Kittywhiskers Van Gogh) c798b49 merge bitcoin#27529: fix `feature_addrman.py` on big-endian systems (Kittywhiskers Van Gogh) 7d149c9 merge bitcoin#27745: select addresses by network follow-up (Kittywhiskers Van Gogh) 1d82994 merge bitcoin#26261: cleanup `LookupIntern`, `Lookup` and `LookupHost` (Kittywhiskers Van Gogh) 231ff82 merge bitcoin#27214: Enable selecting addresses by network (Kittywhiskers Van Gogh) e825595 merge bitcoin#25619: avoid overriding non-virtual ToString() in CService and use better naming (Kittywhiskers Van Gogh) 2e9b48a merge bitcoin#26847: track AddrMan totals by network and table, improve precision of adding fixed seeds (Kittywhiskers Van Gogh) 79a550e merge bitcoin#26040: comment "add only reachable addresses to addrman" (Kittywhiskers Van Gogh) 1adb635 merge bitcoin#25678: skip querying dns seeds if -onlynet disables IPv4 and IPv6 (Kittywhiskers Van Gogh) 2d99be0 partial bitcoin#25331: Add HashWriter without ser-type and ser-version (Kittywhiskers Van Gogh) Pull request description: ## Additional Information * Dependent on #6243 * Dependent on #5167 ## Breaking Changes None observed. ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)** - [x] I have added or updated relevant unit/integration/functional/e2e tests - [x] I have made corresponding changes to the documentation **(note: N/A)** - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: UdjinM6: utACK e544d3c Tree-SHA512: 63f014142c39c47bda3ac85dc6afeee8f2bfec71f033631bca16d41bb0785f4b090b3c860ddc3b3cf6c4a23558d3d102144fc83b065130c3f9ab91d0de8e4457
…t test) 925870d merge bitcoin#27015: bitcoin#26847 fixups (AddrMan totals) (Kittywhiskers Van Gogh) Pull request description: ## Additional Information As part of [dash#6254](#6254), [bitcoin#26847](bitcoin#26847) was backported and since then, it was observed that unit tests were flakier than expected ([build](https://gitlab.com/dashpay/dash/-/jobs/7811041841), [build](https://gitlab.com/dashpay/dash/-/jobs/7802460298)). The flakiness was caused by behavior introduced by the aforementioned backport, this was resolved upstream with [bitcoin#27015](bitcoin#27015), which this pull request contains. ## Breaking Changes None observed. ## Checklist: - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)** - [x] I have added or updated relevant unit/integration/functional/e2e tests - [x] I have made corresponding changes to the documentation **(note: N/A)** - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: UdjinM6: utACK 925870d knst: ACK 925870d Tree-SHA512: 20fc8fb1b162803a71ec4087685460f52ed56c3c86d46ecac4cc0ef59c95b4b6206f0c53bef256242a4a5babb76e3564cfba56a84cbe844e187035de2308b818
AddrMan currently doesn't track the number of its entries by network, it only knows the total number of addresses. This PR makes AddrMan keep track of these numbers, which would be helpful for multiple things:
-onlynetsettings (such that addrs that used to be reachable are no longer and vice versa), in which case they currently could get stuck and not find any outbound peers. The second commit of this PR implements this.AddrMan::Select()to do this requires internal knowledge of the current number of addresses for each network and table to avoid getting stuck in endless loops../bitcoin-cli -addrinfo) is rather indirect by doing the aggregation itself in each call, doesn't distinguish between new and tried, and being based onAddrMan::GetAddr()it's also subject to a quality filter which we probably don't want in this spot.