-
Notifications
You must be signed in to change notification settings - Fork 38.7k
[addrman, net] Ensure tried collisions resolve, and allow feeler connections to existing outbound netgroups #15486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this should resolve #15484, but it's not clear to me if this is entirely sufficient to protect against the class of error, or if we should add additional logic to ResolveCollisions() to ensure that collisions get eventually resolved. |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
|
Updated with a commit that tries to fix the underlying problem in addrman, by doing eviction anyway after 20 minutes (to prevent m_tried_collisions from never being drained). If it looks good, I'll update the OP of this PR before merge. |
|
This bugfix should probably get tagged for 0.18. |
Fixes a bug where feelers could be stuck trying to resolve a collision in the tried table that is to an address in the same netgroup as an existing outbound peer. Thanks to Muoi Tran for the original bug report and detailed debug logs to track this down.
2410d33 to
1b0621b
Compare
After 40 minutes, time out a test-before-evict entry and just evict without testing. Otherwise, if we were unable to test an entry for some reason, we might break using feelers altogether.
1b0621b to
f71fdda
Compare
|
utACK f71fdda |
|
Useful background info for context:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK f71fdda, but would be nice to have a test that demonstrates the original bug.
As @sipa suggested a few years ago, let's put a link to that paper in a comment somewhere.
|
Addressed nits. |
|
utACK |
|
To clarify what I brought up inline: Lines 240 to 253 in 20e6ea2
The log statement says we're moving it to For the case
And @EthanHeilman replied:
But a few lines down it seems we're doing the opposite, namely bias ourselves towards evicting: Lines 572 to 579 in 20e6ea2
We don't know anything about this new entry, because we couldn't connect, so it seems to me it would be more consistent to keep the old entry. It's still much better than getting stuck of course, so utACK 20e6ea2. |
When we insert the new entry in tried in the first place, it's because we've actually tried it. When inserting it we may find that it causes us to want to evict an old tried entry (which has also been tried, but not as recently). What the code is supposed to do (post try before evict) is to test the old tried entry before evicting it, in order to give it a second chance to stay around iff its still connectible. |
|
Ah, I think I get it now... since we don't specifically track when things were added to If we receive a version message: bitcoin/src/net_processing.cpp Line 1765 in 20e6ea2
From an outbound node: bitcoin/src/net_processing.cpp Line 1885 in 20e6ea2
That's when we call |
That's not necessarily when the entry was added to the |
|
utACK |
|
I think this is ready to go -- anything left that I missed? |
|
utACK 20e6ea2 |
…ow feeler connections to existing outbound netgroups 20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar) f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar) 4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar) 4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar) Pull request description: The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this. This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained. Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes. Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
Github-Pull: bitcoin#15486 Rebased-From: 4d83401
Fixes a bug where feelers could be stuck trying to resolve a collision in the tried table that is to an address in the same netgroup as an existing outbound peer. Thanks to Muoi Tran for the original bug report and detailed debug logs to track this down. Github-Pull: bitcoin#15486 Rebased-From: 4991e3c
After 40 minutes, time out a test-before-evict entry and just evict without testing. Otherwise, if we were unable to test an entry for some reason, we might break using feelers altogether. Github-Pull: bitcoin#15486 Rebased-From: f71fdda
Github-Pull: bitcoin#15486 Rebased-From: 20e6ea2
|
Done by @laanwj in f810f14...b80dedb. |
Fixes a bug where feelers could be stuck trying to resolve a collision in the tried table that is to an address in the same netgroup as an existing outbound peer. Thanks to Muoi Tran for the original bug report and detailed debug logs to track this down. Github-Pull: #15486 Rebased-From: 4991e3c Tree-SHA512: 2752c9909d55ff63b9143168033d0d3952effba7f911181919eb62291edf822ec54fffbb20e35b5c5f8cb1092d75c496665da00139dbebe5ce402cbea3ad87c5
After 40 minutes, time out a test-before-evict entry and just evict without testing. Otherwise, if we were unable to test an entry for some reason, we might break using feelers altogether. Github-Pull: #15486 Rebased-From: f71fdda Tree-SHA512: 66c61a9f030b1666e98e4fe66dd6cfc5f1d03b2a1ec01567b195903db6e4412ac778f4464ee9ef35ae6faa1ab7e4b18ef7ecb9a7ced29e6494046990aebf7b76
Github-Pull: bitcoin#15486 Rebased-From: 4d83401 Tree-SHA512: c7843191c470d8b3298d4375632ddbcfd26358d21100bcbffa8d40498782f9dfef08cfb7b3c040525ef912b27ddd593322b60f8f21e16e2ae9bae66433ce807e
Fixes a bug where feelers could be stuck trying to resolve a collision in the tried table that is to an address in the same netgroup as an existing outbound peer. Thanks to Muoi Tran for the original bug report and detailed debug logs to track this down. Github-Pull: bitcoin#15486 Rebased-From: 4991e3c Tree-SHA512: 2752c9909d55ff63b9143168033d0d3952effba7f911181919eb62291edf822ec54fffbb20e35b5c5f8cb1092d75c496665da00139dbebe5ce402cbea3ad87c5
After 40 minutes, time out a test-before-evict entry and just evict without testing. Otherwise, if we were unable to test an entry for some reason, we might break using feelers altogether. Github-Pull: bitcoin#15486 Rebased-From: f71fdda Tree-SHA512: 66c61a9f030b1666e98e4fe66dd6cfc5f1d03b2a1ec01567b195903db6e4412ac778f4464ee9ef35ae6faa1ab7e4b18ef7ecb9a7ced29e6494046990aebf7b76
Github-Pull: bitcoin#15486 Rebased-From: 20e6ea2 Tree-SHA512: 73b37a98cf4d9ede1d39c30d178d63a5a6865dba6cb7a9f33bd1e03445acb708b3007c7cde991b5de96a407262adda23279fe7a1d2ed31b0b5a33b2e97aaba9b
…ow feeler connections to existing outbound netgroups Summary: 20e6ea259b222b10f066f22695a5f56c52071f63 [addrman] Improve collision logging and address nits (Suhas Daftuar) f71fdda3bc2e7acd2a8b74e882364866b8b0f55b [addrman] Ensure collisions eventually get resolved (Suhas Daftuar) 4991e3c813c9848d3b3957ea3ad433f02fca9e81 [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar) 4d834018e368c3481a5421891395f64aa9002185 [addrman] Improve tried table collision logging (Suhas Daftuar) Pull request description: The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this. This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained. Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes. Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6 Backport of Core [[bitcoin/bitcoin#15486 | PR15486]] This PR was in response to bitcoin/bitcoin#15484, but I did not find the issue mentioned [[ bitcoin/bitcoin#15484 (comment) | here ]] in my debug log while running master. Test Plan: ninja ninja check-all Reviewers: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix Subscribers: jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D6489
…ow feeler connections to existing outbound netgroups Summary: 20e6ea259b222b10f066f22695a5f56c52071f63 [addrman] Improve collision logging and address nits (Suhas Daftuar) f71fdda3bc2e7acd2a8b74e882364866b8b0f55b [addrman] Ensure collisions eventually get resolved (Suhas Daftuar) 4991e3c813c9848d3b3957ea3ad433f02fca9e81 [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar) 4d834018e368c3481a5421891395f64aa9002185 [addrman] Improve tried table collision logging (Suhas Daftuar) Pull request description: The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this. This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained. Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes. Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6 Backport of Core [[bitcoin/bitcoin#15486 | PR15486]] This PR was in response to bitcoin/bitcoin#15484, but I did not find the issue mentioned [[ bitcoin/bitcoin#15484 (comment) | here ]] in my debug log while running master. Test Plan: ninja ninja check-all Reviewers: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix Subscribers: jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D6489
…eler connections to existing outbound netgroups ccb6a4a [addrman] Improve collision logging and address nits (Suhas Daftuar) aa3c805 [addrman] Ensure collisions eventually get resolved (Suhas Daftuar) efd1d6a [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar) afcffc4 [addrman] Improve tried table collision logging (Suhas Daftuar) Pull request description: Two pretty direct back ports for the address manager. Coming from bitcoin#13115 and bitcoin#15486. ACKs for top commit: random-zebra: rebase utACK ccb6a4a, and merging... Tree-SHA512: 89d42b3164f92fcee69bc981c7e0887cab09820088732297dbcadbf4ccfc3257e9564ef2b394205f3153c8a03e4c0a0e8145b1c291e8cd980dc06b0ffa65a28c
…and allow feeler connections to existing outbound netgroups 20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar) f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar) 4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar) 4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar) Pull request description: The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this. This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained. Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes. Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
…and allow feeler connections to existing outbound netgroups 20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar) f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar) 4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar) 4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar) Pull request description: The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this. This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained. Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes. Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.
This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (
m_tried_collisions) would never be drained.Also, ensure that all entries don't linger in
m_tried_collisionsby evicting an old entry if its collisions is unresolved after 40 minutes.