Skip to content

Conversation

@sdaftuar
Copy link
Member

@sdaftuar sdaftuar commented Feb 26, 2019

The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (m_tried_collisions) would never be drained.

Also, ensure that all entries don't linger in m_tried_collisions by evicting an old entry if its collisions is unresolved after 40 minutes.

@sdaftuar
Copy link
Member Author

I think this should resolve #15484, but it's not clear to me if this is entirely sufficient to protect against the class of error, or if we should add additional logic to ResolveCollisions() to ensure that collisions get eventually resolved.

@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 26, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@sdaftuar
Copy link
Member Author

Updated with a commit that tries to fix the underlying problem in addrman, by doing eviction anyway after 20 minutes (to prevent m_tried_collisions from never being drained).

If it looks good, I'll update the OP of this PR before merge.

@fanquake fanquake added the P2P label Feb 26, 2019
@gmaxwell
Copy link
Contributor

This bugfix should probably get tagged for 0.18.

@fanquake fanquake added this to the 0.18.0 milestone Feb 26, 2019
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.
@sdaftuar sdaftuar force-pushed the 2019-02-addrman-collisions branch from 2410d33 to 1b0621b Compare February 27, 2019 14:34
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.
@sdaftuar sdaftuar force-pushed the 2019-02-addrman-collisions branch from 1b0621b to f71fdda Compare February 27, 2019 21:54
@sipa
Copy link
Member

sipa commented Feb 28, 2019

utACK f71fdda

@Sjors
Copy link
Member

Sjors commented Mar 1, 2019

Useful background info for context:

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK f71fdda, but would be nice to have a test that demonstrates the original bug.

As @sipa suggested a few years ago, let's put a link to that paper in a comment somewhere.

@sdaftuar
Copy link
Member Author

sdaftuar commented Mar 1, 2019

Addressed nits.

@sdaftuar sdaftuar changed the title [net] Allow feeler connections to go to the same netgroup as existing outbound peers [addrman, net] Ensure tried collisions resolve, and allow feeler connections to existing outbound netgroups Mar 1, 2019
@gmaxwell
Copy link
Contributor

gmaxwell commented Mar 1, 2019

utACK

@Sjors
Copy link
Member

Sjors commented Mar 2, 2019

To clarify what I brought up inline:

bitcoin/src/addrman.cpp

Lines 240 to 253 in 20e6ea2

// Will moving this address into tried evict another entry?
if (test_before_evict && (vvTried[tried_bucket][tried_bucket_pos] != -1)) {
// Output the entry we'd be colliding with, for debugging purposes
auto colliding_entry = mapInfo.find(vvTried[tried_bucket][tried_bucket_pos]);
LogPrint(BCLog::ADDRMAN, "Collision inserting element into tried table (%s), moving %s to m_tried_collisions=%d\n", colliding_entry != mapInfo.end() ? colliding_entry->second.ToString() : "", addr.ToString(), m_tried_collisions.size());
if (m_tried_collisions.size() < ADDRMAN_SET_TRIED_COLLISION_SIZE) {
m_tried_collisions.insert(nId);
}
} else {
LogPrint(BCLog::ADDRMAN, "Moving %s to tried\n", addr.ToString());
// move nId to the tried tables
MakeTried(info, nId);
}

The log statement says we're moving it to m_tried_collisions, but that's not true if that's full. Hence my suggestion to move the log statement inside the if statement (line 245).

For the case m_tried_collisions is full @sdaftuar wrote:

I don't think it's quite as simple as "when in doubt, don't evict". I think in the case of a collision we bias ourselves towards not evicting, unless the peer we might evict can't be connected to via feeler connection.

My thought is that if m_tried_collisions is ever full, it seems like that is a scenario where we may be under attack; so not replacing things in our tried table (which we assume to have some good peers in it) is safe, even if not optimal. Maybe @EthanHeilman or @gmaxwell has a better intuition for how this works though.

And @EthanHeilman replied:

I agree with your intuition.

We bias toward not evicting. The assumption is that if the tried table is contains many evil IPs we have already lost and won't hear about good IPs. Thus, we assume the tried has many good IPs and that they should not be evicted unless a particular IP in tried is offline and thus no longer provides security.

But a few lines down it seems we're doing the opposite, namely bias ourselves towards evicting:

bitcoin/src/addrman.cpp

Lines 572 to 579 in 20e6ea2

} else if (GetAdjustedTime() - info_new.nLastSuccess > ADDRMAN_TEST_WINDOW) {
// If the collision hasn't resolved in some reasonable amount of time,
// just evict the old entry -- we must not be able to
// connect to it for some reason.
LogPrint(BCLog::ADDRMAN, "Unable to test; replacing %s with %s in tried table anyway\n", info_old.ToString(), info_new.ToString());
Good_(info_new, false, GetAdjustedTime());
erase_collision = true;
}

We don't know anything about this new entry, because we couldn't connect, so it seems to me it would be more consistent to keep the old entry.

It's still much better than getting stuck of course, so utACK 20e6ea2.

@gmaxwell
Copy link
Contributor

gmaxwell commented Mar 2, 2019

We don't know anything about this new entry,

When we insert the new entry in tried in the first place, it's because we've actually tried it. When inserting it we may find that it causes us to want to evict an old tried entry (which has also been tried, but not as recently). What the code is supposed to do (post try before evict) is to test the old tried entry before evicting it, in order to give it a second chance to stay around iff its still connectible.

@Sjors
Copy link
Member

Sjors commented Mar 2, 2019

Ah, I think I get it now... since we don't specifically track when things were added to m_tried_collisions, the check info_new.nLastSuccess > ADDRMAN_TEST_WINDOW is just the way we determine when the entry was added to the new table, which coincides with the time we last heard from it, i.e. when it last sent us a version message. And that's also when it's added to m_tried_collisions.

If we receive a version message:

if (strCommand == NetMsgType::VERSION) {

From an outbound node:

if (!pfrom->fInbound)

That's when we call MarkAddressGood in net.cpp which in turn calls CAddrMan::Good_, which adds it to m_tried_collisions.

@sdaftuar
Copy link
Member Author

sdaftuar commented Mar 2, 2019

Ah, I think I get it now... since we don't specifically track when things were added to m_tried_collisions, the check info_new.nLastSuccess > ADDRMAN_TEST_WINDOW is just the way we determine when the entry was added to the new table

That's not necessarily when the entry was added to the new table -- we can do that when learning about an address, for instance when relayed to us via an addr message from another peer. But otherwise I think your description is correct (in particular, that should be the time that we attempted to put it in the tried table but discovered the collision).

@naumenkogs
Copy link
Contributor

utACK

@sdaftuar
Copy link
Member Author

sdaftuar commented Mar 6, 2019

I think this is ready to go -- anything left that I missed?

@laanwj
Copy link
Member

laanwj commented Mar 9, 2019

utACK 20e6ea2

@laanwj laanwj merged commit 20e6ea2 into bitcoin:master Mar 9, 2019
laanwj added a commit that referenced this pull request Mar 9, 2019
…ow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Mar 9, 2019
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Mar 9, 2019
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: bitcoin#15486
Rebased-From: 4991e3c
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Mar 9, 2019
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: bitcoin#15486
Rebased-From: f71fdda
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Mar 9, 2019
@fanquake
Copy link
Member

fanquake commented Mar 9, 2019

Backported in #15563.

Done by @laanwj in f810f14...b80dedb.

laanwj pushed a commit that referenced this pull request Mar 9, 2019
Github-Pull: #15486
Rebased-From: 4d83401
Tree-SHA512: c7843191c470d8b3298d4375632ddbcfd26358d21100bcbffa8d40498782f9dfef08cfb7b3c040525ef912b27ddd593322b60f8f21e16e2ae9bae66433ce807e
laanwj pushed a commit that referenced this pull request Mar 9, 2019
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: #15486
Rebased-From: 4991e3c
Tree-SHA512: 2752c9909d55ff63b9143168033d0d3952effba7f911181919eb62291edf822ec54fffbb20e35b5c5f8cb1092d75c496665da00139dbebe5ce402cbea3ad87c5
laanwj pushed a commit that referenced this pull request Mar 9, 2019
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: #15486
Rebased-From: f71fdda
Tree-SHA512: 66c61a9f030b1666e98e4fe66dd6cfc5f1d03b2a1ec01567b195903db6e4412ac778f4464ee9ef35ae6faa1ab7e4b18ef7ecb9a7ced29e6494046990aebf7b76
laanwj pushed a commit that referenced this pull request Mar 9, 2019
Github-Pull: #15486
Rebased-From: 20e6ea2
Tree-SHA512: 73b37a98cf4d9ede1d39c30d178d63a5a6865dba6cb7a9f33bd1e03445acb708b3007c7cde991b5de96a407262adda23279fe7a1d2ed31b0b5a33b2e97aaba9b
HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Mar 10, 2019
Github-Pull: bitcoin#15486
Rebased-From: 4d83401
Tree-SHA512: c7843191c470d8b3298d4375632ddbcfd26358d21100bcbffa8d40498782f9dfef08cfb7b3c040525ef912b27ddd593322b60f8f21e16e2ae9bae66433ce807e
HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Mar 10, 2019
Fixes a bug where feelers could be stuck trying to resolve a collision in the
tried table that is to an address in the same netgroup as an existing outbound peer.

Thanks to Muoi Tran for the original bug report and detailed debug logs to track
this down.

Github-Pull: bitcoin#15486
Rebased-From: 4991e3c
Tree-SHA512: 2752c9909d55ff63b9143168033d0d3952effba7f911181919eb62291edf822ec54fffbb20e35b5c5f8cb1092d75c496665da00139dbebe5ce402cbea3ad87c5
HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Mar 10, 2019
After 40 minutes, time out a test-before-evict entry and just evict without
testing. Otherwise, if we were unable to test an entry for some reason, we
might break using feelers altogether.

Github-Pull: bitcoin#15486
Rebased-From: f71fdda
Tree-SHA512: 66c61a9f030b1666e98e4fe66dd6cfc5f1d03b2a1ec01567b195903db6e4412ac778f4464ee9ef35ae6faa1ab7e4b18ef7ecb9a7ced29e6494046990aebf7b76
HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Mar 10, 2019
Github-Pull: bitcoin#15486
Rebased-From: 20e6ea2
Tree-SHA512: 73b37a98cf4d9ede1d39c30d178d63a5a6865dba6cb7a9f33bd1e03445acb708b3007c7cde991b5de96a407262adda23279fe7a1d2ed31b0b5a33b2e97aaba9b
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jun 18, 2020
…ow feeler connections to existing outbound netgroups

Summary:
20e6ea259b222b10f066f22695a5f56c52071f63 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda3bc2e7acd2a8b74e882364866b8b0f55b [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c813c9848d3b3957ea3ad433f02fca9e81 [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d834018e368c3481a5421891395f64aa9002185 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6

Backport of Core [[bitcoin/bitcoin#15486 | PR15486]]

This PR was in response to bitcoin/bitcoin#15484, but I did not find the issue mentioned [[ bitcoin/bitcoin#15484 (comment) | here ]] in my debug log while running master.

Test Plan:
  ninja
  ninja check-all

Reviewers: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix

Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix

Subscribers: jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D6489
ftrader pushed a commit to bitcoin-cash-node/bitcoin-cash-node that referenced this pull request Aug 17, 2020
…ow feeler connections to existing outbound netgroups

Summary:
20e6ea259b222b10f066f22695a5f56c52071f63 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda3bc2e7acd2a8b74e882364866b8b0f55b [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c813c9848d3b3957ea3ad433f02fca9e81 [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d834018e368c3481a5421891395f64aa9002185 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6

Backport of Core [[bitcoin/bitcoin#15486 | PR15486]]

This PR was in response to bitcoin/bitcoin#15484, but I did not find the issue mentioned [[ bitcoin/bitcoin#15484 (comment) | here ]] in my debug log while running master.

Test Plan:
  ninja
  ninja check-all

Reviewers: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix

Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, jasonbcox, deadalnix

Subscribers: jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D6489
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Apr 8, 2021
…eler connections to existing outbound netgroups

ccb6a4a [addrman] Improve collision logging and address nits (Suhas Daftuar)
aa3c805 [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
efd1d6a [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
afcffc4 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  Two pretty direct back ports for the address manager.
  Coming from bitcoin#13115 and bitcoin#15486.

ACKs for top commit:
  random-zebra:
    rebase utACK ccb6a4a, and merging...

Tree-SHA512: 89d42b3164f92fcee69bc981c7e0887cab09820088732297dbcadbf4ccfc3257e9564ef2b394205f3153c8a03e4c0a0e8145b1c291e8cd980dc06b0ffa65a28c
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 25, 2021
…and allow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 26, 2021
…and allow feeler connections to existing outbound netgroups

20e6ea2 [addrman] Improve collision logging and address nits (Suhas Daftuar)
f71fdda [addrman] Ensure collisions eventually get resolved (Suhas Daftuar)
4991e3c [net] feeler connections can be made to outbound peers in same netgroup (Suhas Daftuar)
4d83401 [addrman] Improve tried table collision logging (Suhas Daftuar)

Pull request description:

  The restriction on outbound peers sharing the same network group is not intended to apply to feeler connections, so fix this.

  This fixes an issue where a tried table collision with an entry to a netgroup we already have an outbound connection to could cause feelers to stop working, because the tried collision buffer (`m_tried_collisions`) would never be drained.

  Also, ensure that all entries don't linger in `m_tried_collisions` by evicting an old entry if its collisions is unresolved after 40 minutes.

Tree-SHA512: 553fe2b01b82cd7f0f62f90c6781e373455a45b254e3bec085b5e6b16690aa9f3938e8c50e7136f19dafa250ed4578a26227d944b76daf9ce4ef0c75802389b6
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants