rpc: Add back missing cs_main lock in getrawmempool #12273

maflcko · 2018-01-26T02:33:58Z

The getrawmempool rpc should wait for ATMP to completely return before sending back the pool contents. Otherwise, the syncwithvalidationinterface rpc might race against ATMP and be a noop, even though it shouldn't.

When writing to ATMP, the cs_main lock is acquired. So we can wait for to release of cs_main to be sure that ATMP is done.

Effectively reverts #8244

TheBlueMatt

A bit more background - we (essentially) use mempool.cs as a read lock and cs_main as a write lock for the mempool. However, we don't wait for cs_main in getrawmempool so you can have a situation where you start a ATMP call, it holds cs_main, and then getrawmempool is completely unsychronized with anything going on, confusing some tests. Ideally we'd move away from the rather-confused read-write-but-not-upgradeable mempool.cs, but a simple { LOCK(cs_main); } should fix the issue for now.

TheBlueMatt · 2018-01-26T02:39:34Z

src/rpc/blockchain.cpp


-    return mempoolToJSON(fVerbose);
+    // Wait for ATMP calling thread to release the write lock:
+    LOCK(cs_main);


This needs to go before mempoolToJONS, no? ie just do a { LOCK(cs_main); }.

Acquiring the lock and immediately giving it back to the ATMP calling thread will not solve any races. (At least on my machine I can still see them.)

I guess, I could hold it for the whole duration of getrawmempool, if that is what you like.

Ah, oops, yes, sorry, indeed, wrong direction. I guess this is fine for now, but we really need to kill the mempool.cs/cs_main garbage.

I don't understand why does it matter waiting for cs_main when the result is already determined.

We want to wait for

bitcoin/src/validation.cpp

Line 974 in 8470e64

GetMainSignals().TransactionAddedToMempool(ptx);

which happens under cs_main, not pool.cs.

maflcko · 2018-01-26T03:22:20Z

This was hit twice on travis, you might find one of the logs here https://travis-ci.org/bitcoin/bitcoin/jobs/332197296 (might disappear soon, on reset)

promag · 2018-01-27T16:07:21Z

A bit more background - we (essentially) use mempool.cs as a read lock and cs_main as a write lock for the mempool.

Why not just use mempool.cs for both read and write? Stopping a lot of stuff just to dump the mempool sounds bad.

maflcko · 2018-01-27T17:09:30Z

@promag Pull request welcome, but I'd prefer not to do major refactoring in a bug-fix pull request.

promag · 2018-01-27T22:17:26Z

Thanks for the explanation @MarcoFalke.

IMHO, and to cover other possible cases, cs_main could be locked right before returning in SyncWithValidationInterfaceQueue:

void SyncWithValidationInterfaceQueue() {
    AssertLockNotHeld(cs_main);
    // Block until the validation queue drains
    std::promise<void> promise;
    CallFunctionInValidationInterfaceQueue([&promise] {
        promise.set_value();
    });
    promise.get_future().wait();
    // Block until other tasks holding cs_main finish
    LOCK(cs_main);
}

Or in syncwithvalidationinterface RPC after SyncWithValidationInterfaceQueue().

This alternative acknowledges it's not a getrawmempool bug.

TheBlueMatt · 2018-01-28T15:50:01Z

No, we should not take a cs_main in SyncWithValidationInterface. The bug here is mempool.cs - we're using it as a read lock but not holding cs_main during the whole "write" in ATMP. The fix here is fine - sync getrawmempool with ATMP (as it should be). Alternatively we could hold cs_main for longer in ATMP, but adding a cs_main to validationinterface to fix a mempool-specific bug is way overkill. Medium-term (ie post-0.16) we should look into making cs_main a read/write/upgrade lock for real and then this should fall out.

…

On January 27, 2018 10:17:34 PM UTC, "João Barbosa" ***@***.***> wrote: Thanks for the explanation @MarcoFalke. IMHO, and to cover other possible cases, `cs_main` could be locked right before returning in `SyncWithValidationInterfaceQueue`: ```cpp void SyncWithValidationInterfaceQueue() { AssertLockNotHeld(cs_main); // Block until the validation queue drains std::promise<void> promise; CallFunctionInValidationInterfaceQueue([&promise] { promise.set_value(); }); promise.get_future().wait(); // Block until other tasks holding cs_main finish LOCK(cs_main); } ``` Or in `syncwithvalidationinterface` RPC after `SyncWithValidationInterfaceQueue()`. This alternative acknowledges it's not a `getrawmempool` bug. -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: #12273 (comment)

devrandom · 2018-02-03T21:10:16Z

The question for me is whether the production (i.e. non-test) use of getrawmempool benefits from the added synchronization. If taking the lock is only to synchronize against the test-only syncwithvalidationinterface, then it doesn't seem right to pay the lock cost in a production path.

Also, looking at the code, it does seem like cs_main is held for all of ATMP, so I don't understand the comment from @TheBlueMatt.

It seems to me that { LOCK(cs_main); } at the start of syncwithvalidationinterface should work fine (i.e. lock and immediate release).

promag · 2018-02-03T21:19:13Z

The idea is to guarantee the result has transactions already processed and are not in the queue to be processed. Once the lock is acquired before returning means that ATMP finished.

devrandom · 2018-02-03T21:29:43Z

How does the user of getrawmempool use that synchronization guarantee? i.e. what followup would be invalid if the tx is not completely processed?

(we can move to IRC if this is too much back and forth)

promag · 2018-02-03T21:52:10Z

@TheBlueMatt explains above #12273 (review).

I don't think this is a production/mainnet concern.

devrandom · 2018-02-03T22:32:49Z

I understand the comment you linked. It still seems a bit better to solve the test issue by synchronizing an RPC call that is never used in production rather than getrawmempool, which could be used in production. But the performance difference is likely very small, so it's not a strong concern.

TheBlueMatt · 2018-02-06T16:14:05Z

I disagree wholly that this is not a "production concern" - we could absolutely have users who are doing a getrawmempool and then calling wallet functions based on the result, introducing a new race for them. I believe we should be marking this 0.16 as a regression.

devrandom · 2018-02-06T16:18:29Z

Wouldn't all such followup actions take cs_main, so would be safe?

TheBlueMatt · 2018-02-06T16:19:55Z

Err, sorry, not wallet calls, sendrawtransaction followed by a getrawmempool to verify its there - if we can hit a realistic race in testing we probably should consider it a potentially-production-issue.

promag · 2018-02-06T16:26:26Z

Err, sorry, not wallet calls, sendrawtransaction followed by a getrawmempool to verify its there

sendrawtransaction fails if rejected by mempool. But it's a regression like you said.

TheBlueMatt · 2018-02-06T16:32:30Z

One naive usage (which is already at least somewhat racy, but...) would be to do a getrawmempool to cheaply look up a transaction you just sent's fee. Not a massive deal, but definitely a regression worth fixing.

devrandom · 2018-02-06T16:59:51Z

With the proposed fix, sendrawtransaction followed by a getrawmempool is not guaranteed to return the sent transaction, since the lock is taken after the results are computed.

TheBlueMatt · 2018-02-06T17:51:20Z

Oh, I seem to have confused myself an mis-remembered the issue here - you cannot hit this purely from RPC, but can hit wallet errors:
If you're polling getrawmempool to wait for a transaction to appear in your mempool, you can then race ATMP and, thus, see a balance that is as-of an old mempool and not the one you just got out of getrawmempool

maflcko · 2018-02-06T18:50:26Z

Changed milestone to 0.16

This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in bitcoin#12273

maflcko · 2018-02-06T18:55:11Z

There has been too much discussion around this simple fix.

Closing in favor of #12368

02fc886 Add braces to meet code style on line-after-the-one-changed. (Matt Corallo) 85aa839 Hold mempool.cs for the duration of ATMP. (Matt Corallo) Pull request description: This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in #12273 Tree-SHA512: 29464b9ca3890010ae13b7dc1c53487cc2bc9c3cf3d32a14cb09c8aa33848f57959d8991ea096beebcfb72f062e4e1962f104aefe4252c7db87633bbfe4ab317

This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in #12273 Github-Pull: #12368 Rebased-From: 85aa839 Tree-SHA512: 90a505a96cecc065e8575d816f3bb35040df8672efc315f45eb3f2ea086e8ea6ee2c99eed03d0fe2215c8d3ee947a7b120e3c57a25185d03550c9075573ab032

This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in bitcoin#12273

This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in bitcoin#12273 Github-Pull: bitcoin#12368 Rebased-From: 85aa839 Tree-SHA512: 90a505a96cecc065e8575d816f3bb35040df8672efc315f45eb3f2ea086e8ea6ee2c99eed03d0fe2215c8d3ee947a7b120e3c57a25185d03550c9075573ab032

02fc886 Add braces to meet code style on line-after-the-one-changed. (Matt Corallo) 85aa839 Hold mempool.cs for the duration of ATMP. (Matt Corallo) Pull request description: This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in bitcoin#12273 Tree-SHA512: 29464b9ca3890010ae13b7dc1c53487cc2bc9c3cf3d32a14cb09c8aa33848f57959d8991ea096beebcfb72f062e4e1962f104aefe4252c7db87633bbfe4ab317

rpc: Add back missing cs_main lock in getrawmempool

fabac46

TheBlueMatt reviewed Jan 26, 2018

View reviewed changes

maflcko added the RPC/REST/ZMQ label Jan 26, 2018

laanwj added this to the 0.16.1 milestone Feb 6, 2018

maflcko modified the milestones: 0.16.1, 0.16.0 Feb 6, 2018

TheBlueMatt mentioned this pull request Feb 6, 2018

Hold mempool.cs for the duration of ATMP. #12368

Merged

maflcko closed this Feb 6, 2018

maflcko deleted the Mf1801-rpcMempoolGetLock branch February 6, 2018 18:55

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

rpc: Add back missing cs_main lock in getrawmempool #12273

rpc: Add back missing cs_main lock in getrawmempool #12273

Uh oh!

Conversation

maflcko commented Jan 26, 2018

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Jan 26, 2018

Choose a reason for hiding this comment

Uh oh!

maflcko Jan 26, 2018

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Jan 26, 2018

Choose a reason for hiding this comment

Uh oh!

promag Jan 27, 2018

Choose a reason for hiding this comment

Uh oh!

maflcko Jan 27, 2018

Choose a reason for hiding this comment

Uh oh!

maflcko commented Jan 26, 2018

Uh oh!

promag commented Jan 27, 2018

Uh oh!

maflcko commented Jan 27, 2018

Uh oh!

promag commented Jan 27, 2018

Uh oh!

TheBlueMatt commented Jan 28, 2018 via email

Uh oh!

devrandom commented Feb 3, 2018

Uh oh!

promag commented Feb 3, 2018

Uh oh!

devrandom commented Feb 3, 2018

Uh oh!

promag commented Feb 3, 2018

Uh oh!

devrandom commented Feb 3, 2018

Uh oh!

TheBlueMatt commented Feb 6, 2018

Uh oh!

devrandom commented Feb 6, 2018

Uh oh!

TheBlueMatt commented Feb 6, 2018

Uh oh!

promag commented Feb 6, 2018

Uh oh!

TheBlueMatt commented Feb 6, 2018

Uh oh!

devrandom commented Feb 6, 2018

Uh oh!

TheBlueMatt commented Feb 6, 2018

Uh oh!

maflcko commented Feb 6, 2018

Uh oh!

maflcko commented Feb 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants