-
Notifications
You must be signed in to change notification settings - Fork 38.8k
rpc: Add back missing cs_main lock in getrawmempool #12273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit more background - we (essentially) use mempool.cs as a read lock and cs_main as a write lock for the mempool. However, we don't wait for cs_main in getrawmempool so you can have a situation where you start a ATMP call, it holds cs_main, and then getrawmempool is completely unsychronized with anything going on, confusing some tests. Ideally we'd move away from the rather-confused read-write-but-not-upgradeable mempool.cs, but a simple { LOCK(cs_main); } should fix the issue for now.
|
|
||
| return mempoolToJSON(fVerbose); | ||
| // Wait for ATMP calling thread to release the write lock: | ||
| LOCK(cs_main); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to go before mempoolToJONS, no? ie just do a { LOCK(cs_main); }.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acquiring the lock and immediately giving it back to the ATMP calling thread will not solve any races. (At least on my machine I can still see them.)
I guess, I could hold it for the whole duration of getrawmempool, if that is what you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, oops, yes, sorry, indeed, wrong direction. I guess this is fine for now, but we really need to kill the mempool.cs/cs_main garbage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why does it matter waiting for cs_main when the result is already determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to wait for
Line 974 in 8470e64
| GetMainSignals().TransactionAddedToMempool(ptx); |
which happens under
cs_main, not pool.cs.
|
This was hit twice on travis, you might find one of the logs here https://travis-ci.org/bitcoin/bitcoin/jobs/332197296 (might disappear soon, on reset) |
Why not just use |
|
@promag Pull request welcome, but I'd prefer not to do major refactoring in a bug-fix pull request. |
|
Thanks for the explanation @MarcoFalke. IMHO, and to cover other possible cases, void SyncWithValidationInterfaceQueue() {
AssertLockNotHeld(cs_main);
// Block until the validation queue drains
std::promise<void> promise;
CallFunctionInValidationInterfaceQueue([&promise] {
promise.set_value();
});
promise.get_future().wait();
// Block until other tasks holding cs_main finish
LOCK(cs_main);
}Or in This alternative acknowledges it's not a |
|
No, we should not take a cs_main in SyncWithValidationInterface. The bug here is mempool.cs - we're using it as a read lock but not holding cs_main during the whole "write" in ATMP. The fix here is fine - sync getrawmempool with ATMP (as it should be). Alternatively we could hold cs_main for longer in ATMP, but adding a cs_main to validationinterface to fix a mempool-specific bug is way overkill.
Medium-term (ie post-0.16) we should look into making cs_main a read/write/upgrade lock for real and then this should fall out.
…On January 27, 2018 10:17:34 PM UTC, "João Barbosa" ***@***.***> wrote:
Thanks for the explanation @MarcoFalke.
IMHO, and to cover other possible cases, `cs_main` could be locked
right before returning in `SyncWithValidationInterfaceQueue`:
```cpp
void SyncWithValidationInterfaceQueue() {
AssertLockNotHeld(cs_main);
// Block until the validation queue drains
std::promise<void> promise;
CallFunctionInValidationInterfaceQueue([&promise] {
promise.set_value();
});
promise.get_future().wait();
// Block until other tasks holding cs_main finish
LOCK(cs_main);
}
```
Or in `syncwithvalidationinterface` RPC after
`SyncWithValidationInterfaceQueue()`.
This alternative acknowledges it's not a `getrawmempool` bug.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#12273 (comment)
|
|
The question for me is whether the production (i.e. non-test) use of Also, looking at the code, it does seem like cs_main is held for all of ATMP, so I don't understand the comment from @TheBlueMatt. It seems to me that |
|
The idea is to guarantee the result has transactions already processed and are not in the queue to be processed. Once the lock is acquired before returning means that ATMP finished. |
|
How does the user of (we can move to IRC if this is too much back and forth) |
|
@TheBlueMatt explains above #12273 (review). I don't think this is a production/mainnet concern. |
|
I understand the comment you linked. It still seems a bit better to solve the test issue by synchronizing an RPC call that is never used in production rather than |
|
I disagree wholly that this is not a "production concern" - we could absolutely have users who are doing a getrawmempool and then calling wallet functions based on the result, introducing a new race for them. I believe we should be marking this 0.16 as a regression. |
|
Wouldn't all such followup actions take |
|
Err, sorry, not wallet calls, sendrawtransaction followed by a getrawmempool to verify its there - if we can hit a realistic race in testing we probably should consider it a potentially-production-issue. |
|
|
One naive usage (which is already at least somewhat racy, but...) would be to do a getrawmempool to cheaply look up a transaction you just sent's fee. Not a massive deal, but definitely a regression worth fixing. |
|
With the proposed fix, |
|
Oh, I seem to have confused myself an mis-remembered the issue here - you cannot hit this purely from RPC, but can hit wallet errors: |
|
Changed milestone to 0.16 |
This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in bitcoin#12273
|
There has been too much discussion around this simple fix. Closing in favor of #12368 |
02fc886 Add braces to meet code style on line-after-the-one-changed. (Matt Corallo) 85aa839 Hold mempool.cs for the duration of ATMP. (Matt Corallo) Pull request description: This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in #12273 Tree-SHA512: 29464b9ca3890010ae13b7dc1c53487cc2bc9c3cf3d32a14cb09c8aa33848f57959d8991ea096beebcfb72f062e4e1962f104aefe4252c7db87633bbfe4ab317
This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in #12273 Github-Pull: #12368 Rebased-From: 85aa839 Tree-SHA512: 90a505a96cecc065e8575d816f3bb35040df8672efc315f45eb3f2ea086e8ea6ee2c99eed03d0fe2215c8d3ee947a7b120e3c57a25185d03550c9075573ab032
This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in bitcoin#12273
This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in bitcoin#12273 Github-Pull: bitcoin#12368 Rebased-From: 85aa839 Tree-SHA512: 90a505a96cecc065e8575d816f3bb35040df8672efc315f45eb3f2ea086e8ea6ee2c99eed03d0fe2215c8d3ee947a7b120e3c57a25185d03550c9075573ab032
02fc886 Add braces to meet code style on line-after-the-one-changed. (Matt Corallo) 85aa839 Hold mempool.cs for the duration of ATMP. (Matt Corallo) Pull request description: This resolves an issue where getrawmempool() can race mempool notification signals. Intuitively we use mempool.cs as a "read lock" on the mempool with cs_main being the write lock, so holding the read lock intermittently while doing write operations is somewhat strange. This also avoids the introduction of cs_main in getrawmempool() which reviewers objected to in the previous fix in bitcoin#12273 Tree-SHA512: 29464b9ca3890010ae13b7dc1c53487cc2bc9c3cf3d32a14cb09c8aa33848f57959d8991ea096beebcfb72f062e4e1962f104aefe4252c7db87633bbfe4ab317
The
getrawmempoolrpc should wait for ATMP to completely return before sending back the pool contents. Otherwise, thesyncwithvalidationinterfacerpc might race against ATMP and be a noop, even though it shouldn't.When writing to ATMP, the
cs_mainlock is acquired. So we can wait for to release ofcs_mainto be sure that ATMP is done.Effectively reverts #8244