net: RecordBytesSent under cs_vSend lock #18784

maflcko · 2020-04-27T17:24:52Z

The CNode member nSendBytes is incremented under the node's lock cs_vSend. However, RecordBytesSent is not. An RPC thread that acquires the cs_vSend lock puts the message handler thread on hold. While the thread is on hold, getnettotals returns "stale" values or values that don't add up.

This can be fixed by making cs_vSend a "write lock" for the total bytes sent in connman.

The CNode member nSendBytes is incremented under the node's lock cs_vSend. However, RecordBytesSent is not. An RPC thread that acquires the cs_vSend lock puts the message handler thread on hold. While the thread is on hold, getnettotals returns "stale" values or values that don't add up. This can be fixed by making cs_vSend a "write lock" for the total bytes sent in connman. After this commit, both calls to RecordBytesSent are done under the LOCK(pnode->cs_vSend);

DrahtBot · 2020-07-23T01:41:56Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

Move cs_vSend into SocketSendData and resolve RecordBytesSent lock inconsistency #19673 (Move cs_vSend into SocketSendData and resolve RecordBytesSent lock inconsistency by troygiorshev)
Per-Peer Message Capture #19509 (Per-Peer Message Capture by troygiorshev)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

troygiorshev · 2020-07-24T12:51:39Z

~~ACK faae6ca~~

~~Reviewed, ran tests.~~

~~This is a nice readability improvement too 📖~~

Changed my mind, see below.

jnewbery · 2020-07-31T16:36:30Z

getnettotals returns "stale" values or values that don't add up

Can you explain this a bit? What do the values not add up to? As far as I can tell, getnettotals simply returns the totalbytessent and totalbytesreceived at some point in time. There are not consistency guarantees with subsequent calls to the getpeerinfo values.

troygiorshev · 2020-08-05T21:44:58Z

I've changed my mind on this one. getnettotals and getpeerinfo are two separate rpcs, we shouldn't try and make them sync up. nTotalBytesSent is separate from a node's nSendBytes. Looking at the analogous structures on the receiving side, nTotalBytesRecv is clearly separate from nRecvBytes.

However, I think a PR like this is needed to bring consistency between the two calls to RecordBytesSent (as it stands before this PR, one is under cs_vSend and the other is not). I can see a few more improvements available too.

promag

Concept ACK.

~~What is the motivation to have the lock when calling RecordBytesSent?~~ got it.

maflcko · 2020-08-27T15:09:10Z

I've changed my mind on this one. getnettotals and getpeerinfo are two separate rpcs, we shouldn't try and make them sync up. nTotalBytesSent is separate from a node's nSendBytes. Looking at the analogous structures on the receiving side, nTotalBytesRecv is clearly separate from nRecvBytes.

Why are they separate? Let's assume we connect to a peer, send 100 bytes, disconnect. Send 100 bytes to another peer, then the total bytes sent should simply equal 200 bytes.

troygiorshev · 2020-08-27T15:35:35Z

Why are they separate? Let's assume we connect to a peer, send 100 bytes, disconnect. Send 100 bytes to another peer, then the total bytes sent should simply equal 200 bytes.

Maybe I'm completely off-base here, I'm not trying to make a big statement.

Isn't it impossible to keep these RPCs synced? They are two separate calls, something might happen in between them.

Call getnettotals, note the value of totalbytesrecv
node receives a message from another peer, of size 100 bytes
Call getpeerinfo, see that the sum of bytesrecv for each node is 100 bytes more than totalbytesrecv from above

maflcko · 2020-08-27T15:51:32Z

Ok, I see. The reason I created this pull was that a call to

        net_totals_before = self.nodes[0].getnettotals()
        peer_info = self.nodes[0].getpeerinfo()
        net_totals_after = self.nodes[0].getnettotals()

and then asserting that before <= sum <= after would sometimes fail.

I haven't looked into #19673 so I am wondering if #19673 also fixes that issue.

troygiorshev · 2020-08-27T15:57:41Z

^

Woah that shouldn't happen. Will investigate.

maflcko · 2020-08-27T16:14:23Z

#17107

jnewbery · 2020-10-28T09:00:25Z

a call to
   net_totals_before = self.nodes[0].getnettotals()
   peer_info = self.nodes[0].getpeerinfo()
   net_totals_after = self.nodes[0].getnettotals()
and then asserting that before <= sum <= after would sometimes fail.`

What is sum here? Are you saying that getnettotals.totalbytessent can go down?

I don't understand the problem you're trying to solve.

maflcko · 2020-10-28T09:19:55Z

Are you saying that getnettotals.totalbytessent can go down?

No, it is returning the value it returned in a previous call. But getpeerinfo already returned the new correct value.

#17107 (comment)

jnewbery · 2020-10-28T09:34:43Z

I agree with Wlad's assessment in that issue:

these are meant to be overall statistics, it doesn't seem super important that the net totals are immediately up to date.

maflcko · 2020-10-28T09:42:05Z

Ok, then the test is wrong and should be removed:

diff --git a/test/functional/rpc_net.py b/test/functional/rpc_net.py
index 03c858c694..afb67891af 100755
--- a/test/functional/rpc_net.py
+++ b/test/functional/rpc_net.py
@@ -104,29 +104,15 @@ class NetTest(BitcoinTestFramework):
 
     def test_getnettotals(self):
         self.log.info("Test getnettotals")
-        # getnettotals totalbytesrecv and totalbytessent should be
-        # consistent with getpeerinfo. Since the RPC calls are not atomic,
-        # and messages might have been recvd or sent between RPC calls, call
-        # getnettotals before and after and verify that the returned values
-        # from getpeerinfo are bounded by those values.
-        net_totals_before = self.nodes[0].getnettotals()
         peer_info = self.nodes[0].getpeerinfo()
-        net_totals_after = self.nodes[0].getnettotals()
-        assert_equal(len(peer_info), 2)
-        peers_recv = sum([peer['bytesrecv'] for peer in peer_info])
-        peers_sent = sum([peer['bytessent'] for peer in peer_info])
-
-        assert_greater_than_or_equal(peers_recv, net_totals_before['totalbytesrecv'])
-        assert_greater_than_or_equal(net_totals_after['totalbytesrecv'], peers_recv)
-        assert_greater_than_or_equal(peers_sent, net_totals_before['totalbytessent'])
-        assert_greater_than_or_equal(net_totals_after['totalbytessent'], peers_sent)
+        net_totals = self.nodes[0].getnettotals()
 
         # test getnettotals and getpeerinfo by doing a ping
         # the bytes sent/received should change
         # note ping and pong are 32 bytes each
         self.nodes[0].ping()
-        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytessent'] >= net_totals_after['totalbytessent'] + 32 * 2), timeout=1)
-        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytesrecv'] >= net_totals_after['totalbytesrecv'] + 32 * 2), timeout=1)
+        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytessent'] >= net_totals['totalbytessent'] + 32 * 2), timeout=1)
+        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytesrecv'] >= net_totals['totalbytesrecv'] + 32 * 2), timeout=1)
 
         peer_info_after_ping = self.nodes[0].getpeerinfo()
         for before, after in zip(peer_info, peer_info_after_ping):

jnewbery · 2020-10-28T10:19:13Z

the test is wrong and should be removed:

Done: #20258

maflcko · 2020-10-28T10:21:53Z

thanks

DrahtBot added the P2P label Apr 27, 2020

maflcko added this to the 0.21.0 milestone Apr 28, 2020

maflcko force-pushed the 2004-netLockRecordBytesSent branch from ffffedf to faae6ca Compare May 13, 2020 16:11

DrahtBot mentioned this pull request Jul 22, 2020

Per-Peer Message Capture #19509

Merged

This was referenced Aug 5, 2020

Move cs_vSend into SocketSendData and resolve RecordBytesSent lock inconsistency troygiorshev/bitcoin#4

Closed

Move cs_vSend into SocketSendData and resolve RecordBytesSent lock inconsistency #19673

Closed

promag reviewed Aug 15, 2020

View reviewed changes

maflcko closed this Oct 28, 2020

maflcko deleted the 2004-netLockRecordBytesSent branch October 28, 2020 10:21

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

net: RecordBytesSent under cs_vSend lock #18784

net: RecordBytesSent under cs_vSend lock #18784

Uh oh!

Conversation

maflcko commented Apr 27, 2020

Uh oh!

DrahtBot commented Jul 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

troygiorshev commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnewbery commented Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

troygiorshev commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

promag left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maflcko commented Aug 27, 2020

Uh oh!

troygiorshev commented Aug 27, 2020

Uh oh!

maflcko commented Aug 27, 2020

Uh oh!

troygiorshev commented Aug 27, 2020

Uh oh!

maflcko commented Aug 27, 2020

Uh oh!

jnewbery commented Oct 28, 2020

Uh oh!

maflcko commented Oct 28, 2020

Uh oh!

jnewbery commented Oct 28, 2020

Uh oh!

maflcko commented Oct 28, 2020

Uh oh!

jnewbery commented Oct 28, 2020

Uh oh!

maflcko commented Oct 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DrahtBot commented Jul 23, 2020 •

edited

Loading

troygiorshev commented Jul 24, 2020 •

edited

Loading

jnewbery commented Jul 31, 2020 •

edited

Loading

troygiorshev commented Aug 5, 2020 •

edited

Loading

promag left a comment •

edited

Loading