Skip to content

Conversation

@maflcko
Copy link
Member

@maflcko maflcko commented Apr 27, 2020

The CNode member nSendBytes is incremented under the node's lock cs_vSend. However, RecordBytesSent is not. An RPC thread that acquires the cs_vSend lock puts the message handler thread on hold. While the thread is on hold, getnettotals returns "stale" values or values that don't add up.

This can be fixed by making cs_vSend a "write lock" for the total bytes sent in connman.

@DrahtBot DrahtBot added the P2P label Apr 27, 2020
@maflcko maflcko added this to the 0.21.0 milestone Apr 28, 2020
The CNode member nSendBytes is incremented under the node's lock
cs_vSend. However, RecordBytesSent is not. An RPC thread that acquires
the cs_vSend lock puts the message handler thread on hold. While the
thread is on hold, getnettotals returns "stale" values or values that
don't add up.

This can be fixed by making cs_vSend a "write lock" for the total bytes
sent in connman.

After this commit, both calls to RecordBytesSent are done under the
LOCK(pnode->cs_vSend);
@maflcko maflcko force-pushed the 2004-netLockRecordBytesSent branch from ffffedf to faae6ca Compare May 13, 2020 16:11
@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 23, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@troygiorshev
Copy link
Contributor

troygiorshev commented Jul 24, 2020

ACK faae6ca

Reviewed, ran tests.

This is a nice readability improvement too 📖

Changed my mind, see below.

@jnewbery
Copy link
Contributor

jnewbery commented Jul 31, 2020

getnettotals returns "stale" values or values that don't add up

Can you explain this a bit? What do the values not add up to? As far as I can tell, getnettotals simply returns the totalbytessent and totalbytesreceived at some point in time. There are not consistency guarantees with subsequent calls to the getpeerinfo values.

@troygiorshev
Copy link
Contributor

troygiorshev commented Aug 5, 2020

I've changed my mind on this one. getnettotals and getpeerinfo are two separate rpcs, we shouldn't try and make them sync up. nTotalBytesSent is separate from a node's nSendBytes. Looking at the analogous structures on the receiving side, nTotalBytesRecv is clearly separate from nRecvBytes.

However, I think a PR like this is needed to bring consistency between the two calls to RecordBytesSent (as it stands before this PR, one is under cs_vSend and the other is not). I can see a few more improvements available too.

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

What is the motivation to have the lock when calling RecordBytesSent? got it.

@maflcko
Copy link
Member Author

maflcko commented Aug 27, 2020

I've changed my mind on this one. getnettotals and getpeerinfo are two separate rpcs, we shouldn't try and make them sync up. nTotalBytesSent is separate from a node's nSendBytes. Looking at the analogous structures on the receiving side, nTotalBytesRecv is clearly separate from nRecvBytes.

Why are they separate? Let's assume we connect to a peer, send 100 bytes, disconnect. Send 100 bytes to another peer, then the total bytes sent should simply equal 200 bytes.

@troygiorshev
Copy link
Contributor

Why are they separate? Let's assume we connect to a peer, send 100 bytes, disconnect. Send 100 bytes to another peer, then the total bytes sent should simply equal 200 bytes.

Maybe I'm completely off-base here, I'm not trying to make a big statement.

Isn't it impossible to keep these RPCs synced? They are two separate calls, something might happen in between them.

  1. Call getnettotals, note the value of totalbytesrecv
  2. node receives a message from another peer, of size 100 bytes
  3. Call getpeerinfo, see that the sum of bytesrecv for each node is 100 bytes more than totalbytesrecv from above

@maflcko
Copy link
Member Author

maflcko commented Aug 27, 2020

Ok, I see. The reason I created this pull was that a call to

        net_totals_before = self.nodes[0].getnettotals()
        peer_info = self.nodes[0].getpeerinfo()
        net_totals_after = self.nodes[0].getnettotals()

and then asserting that before <= sum <= after would sometimes fail.

I haven't looked into #19673 so I am wondering if #19673 also fixes that issue.

@troygiorshev
Copy link
Contributor

^

Woah that shouldn't happen. Will investigate.

@maflcko
Copy link
Member Author

maflcko commented Aug 27, 2020

#17107

@jnewbery
Copy link
Contributor

a call to

   net_totals_before = self.nodes[0].getnettotals()
   peer_info = self.nodes[0].getpeerinfo()
   net_totals_after = self.nodes[0].getnettotals()

and then asserting that before <= sum <= after would sometimes fail.`

What is sum here? Are you saying that getnettotals.totalbytessent can go down?

I don't understand the problem you're trying to solve.

@maflcko
Copy link
Member Author

maflcko commented Oct 28, 2020

Are you saying that getnettotals.totalbytessent can go down?

No, it is returning the value it returned in a previous call. But getpeerinfo already returned the new correct value.

#17107 (comment)

@jnewbery
Copy link
Contributor

I agree with Wlad's assessment in that issue:

these are meant to be overall statistics, it doesn't seem super important that the net totals are immediately up to date.

@maflcko
Copy link
Member Author

maflcko commented Oct 28, 2020

Ok, then the test is wrong and should be removed:

diff --git a/test/functional/rpc_net.py b/test/functional/rpc_net.py
index 03c858c694..afb67891af 100755
--- a/test/functional/rpc_net.py
+++ b/test/functional/rpc_net.py
@@ -104,29 +104,15 @@ class NetTest(BitcoinTestFramework):
 
     def test_getnettotals(self):
         self.log.info("Test getnettotals")
-        # getnettotals totalbytesrecv and totalbytessent should be
-        # consistent with getpeerinfo. Since the RPC calls are not atomic,
-        # and messages might have been recvd or sent between RPC calls, call
-        # getnettotals before and after and verify that the returned values
-        # from getpeerinfo are bounded by those values.
-        net_totals_before = self.nodes[0].getnettotals()
         peer_info = self.nodes[0].getpeerinfo()
-        net_totals_after = self.nodes[0].getnettotals()
-        assert_equal(len(peer_info), 2)
-        peers_recv = sum([peer['bytesrecv'] for peer in peer_info])
-        peers_sent = sum([peer['bytessent'] for peer in peer_info])
-
-        assert_greater_than_or_equal(peers_recv, net_totals_before['totalbytesrecv'])
-        assert_greater_than_or_equal(net_totals_after['totalbytesrecv'], peers_recv)
-        assert_greater_than_or_equal(peers_sent, net_totals_before['totalbytessent'])
-        assert_greater_than_or_equal(net_totals_after['totalbytessent'], peers_sent)
+        net_totals = self.nodes[0].getnettotals()
 
         # test getnettotals and getpeerinfo by doing a ping
         # the bytes sent/received should change
         # note ping and pong are 32 bytes each
         self.nodes[0].ping()
-        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytessent'] >= net_totals_after['totalbytessent'] + 32 * 2), timeout=1)
-        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytesrecv'] >= net_totals_after['totalbytesrecv'] + 32 * 2), timeout=1)
+        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytessent'] >= net_totals['totalbytessent'] + 32 * 2), timeout=1)
+        self.wait_until(lambda: (self.nodes[0].getnettotals()['totalbytesrecv'] >= net_totals['totalbytesrecv'] + 32 * 2), timeout=1)
 
         peer_info_after_ping = self.nodes[0].getpeerinfo()
         for before, after in zip(peer_info, peer_info_after_ping):

@jnewbery
Copy link
Contributor

the test is wrong and should be removed:

Done: #20258

@maflcko
Copy link
Member Author

maflcko commented Oct 28, 2020

thanks

@maflcko maflcko closed this Oct 28, 2020
@maflcko maflcko deleted the 2004-netLockRecordBytesSent branch October 28, 2020 10:21
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants