Skip to content

Conversation

@maflcko
Copy link
Member

@maflcko maflcko commented Jan 26, 2018

The getrawmempool rpc should wait for ATMP to completely return before sending back the pool contents. Otherwise, the syncwithvalidationinterface rpc might race against ATMP and be a noop, even though it shouldn't.

When writing to ATMP, the cs_main lock is acquired. So we can wait for to release of cs_main to be sure that ATMP is done.

Effectively reverts #8244

Copy link
Contributor

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more background - we (essentially) use mempool.cs as a read lock and cs_main as a write lock for the mempool. However, we don't wait for cs_main in getrawmempool so you can have a situation where you start a ATMP call, it holds cs_main, and then getrawmempool is completely unsychronized with anything going on, confusing some tests. Ideally we'd move away from the rather-confused read-write-but-not-upgradeable mempool.cs, but a simple { LOCK(cs_main); } should fix the issue for now.


return mempoolToJSON(fVerbose);
// Wait for ATMP calling thread to release the write lock:
LOCK(cs_main);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to go before mempoolToJONS, no? ie just do a { LOCK(cs_main); }.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acquiring the lock and immediately giving it back to the ATMP calling thread will not solve any races. (At least on my machine I can still see them.)

I guess, I could hold it for the whole duration of getrawmempool, if that is what you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, oops, yes, sorry, indeed, wrong direction. I guess this is fine for now, but we really need to kill the mempool.cs/cs_main garbage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why does it matter waiting for cs_main when the result is already determined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to wait for

GetMainSignals().TransactionAddedToMempool(ptx);

which happens under cs_main, not pool.cs.

@maflcko
Copy link
Member Author

maflcko commented Jan 26, 2018

This was hit twice on travis, you might find one of the logs here https://travis-ci.org/bitcoin/bitcoin/jobs/332197296 (might disappear soon, on reset)

@promag
Copy link
Contributor

promag commented Jan 27, 2018

A bit more background - we (essentially) use mempool.cs as a read lock and cs_main as a write lock for the mempool.

Why not just use mempool.cs for both read and write? Stopping a lot of stuff just to dump the mempool sounds bad.

@maflcko
Copy link
Member Author

maflcko commented Jan 27, 2018

@promag Pull request welcome, but I'd prefer not to do major refactoring in a bug-fix pull request.

@promag
Copy link
Contributor

promag commented Jan 27, 2018

Thanks for the explanation @MarcoFalke.

IMHO, and to cover other possible cases, cs_main could be locked right before returning in SyncWithValidationInterfaceQueue:

void SyncWithValidationInterfaceQueue() {
    AssertLockNotHeld(cs_main);
    // Block until the validation queue drains
    std::promise<void> promise;
    CallFunctionInValidationInterfaceQueue([&promise] {
        promise.set_value();
    });
    promise.get_future().wait();
    // Block until other tasks holding cs_main finish
    LOCK(cs_main);
}

Or in syncwithvalidationinterface RPC after SyncWithValidationInterfaceQueue().

This alternative acknowledges it's not a getrawmempool bug.

@TheBlueMatt
Copy link
Contributor

TheBlueMatt commented Jan 28, 2018 via email

@devrandom
Copy link

The question for me is whether the production (i.e. non-test) use of getrawmempool benefits from the added synchronization. If taking the lock is only to synchronize against the test-only syncwithvalidationinterface, then it doesn't seem right to pay the lock cost in a production path.

Also, looking at the code, it does seem like cs_main is held for all of ATMP, so I don't understand the comment from @TheBlueMatt.

It seems to me that { LOCK(cs_main); } at the start of syncwithvalidationinterface should work fine (i.e. lock and immediate release).

@promag
Copy link
Contributor

promag commented Feb 3, 2018

The idea is to guarantee the result has transactions already processed and are not in the queue to be processed. Once the lock is acquired before returning means that ATMP finished.

@devrandom
Copy link

How does the user of getrawmempool use that synchronization guarantee? i.e. what followup would be invalid if the tx is not completely processed?

(we can move to IRC if this is too much back and forth)

@promag
Copy link
Contributor

promag commented Feb 3, 2018

@TheBlueMatt explains above #12273 (review).

I don't think this is a production/mainnet concern.

@devrandom
Copy link

I understand the comment you linked. It still seems a bit better to solve the test issue by synchronizing an RPC call that is never used in production rather than getrawmempool, which could be used in production. But the performance difference is likely very small, so it's not a strong concern.

@TheBlueMatt
Copy link
Contributor

I disagree wholly that this is not a "production concern" - we could absolutely have users who are doing a getrawmempool and then calling wallet functions based on the result, introducing a new race for them. I believe we should be marking this 0.16 as a regression.

@devrandom
Copy link

Wouldn't all such followup actions take cs_main, so would be safe?

@TheBlueMatt
Copy link
Contributor

Err, sorry, not wallet calls, sendrawtransaction followed by a getrawmempool to verify its there - if we can hit a realistic race in testing we probably should consider it a potentially-production-issue.

@laanwj laanwj added this to the 0.16.1 milestone Feb 6, 2018
@promag
Copy link
Contributor

promag commented Feb 6, 2018

Err, sorry, not wallet calls, sendrawtransaction followed by a getrawmempool to verify its there

sendrawtransaction fails if rejected by mempool. But it's a regression like you said.

@TheBlueMatt
Copy link
Contributor

One naive usage (which is already at least somewhat racy, but...) would be to do a getrawmempool to cheaply look up a transaction you just sent's fee. Not a massive deal, but definitely a regression worth fixing.

@devrandom
Copy link

With the proposed fix, sendrawtransaction followed by a getrawmempool is not guaranteed to return the sent transaction, since the lock is taken after the results are computed.

@TheBlueMatt
Copy link
Contributor

Oh, I seem to have confused myself an mis-remembered the issue here - you cannot hit this purely from RPC, but can hit wallet errors:
If you're polling getrawmempool to wait for a transaction to appear in your mempool, you can then race ATMP and, thus, see a balance that is as-of an old mempool and not the one you just got out of getrawmempool

@maflcko maflcko modified the milestones: 0.16.1, 0.16.0 Feb 6, 2018
@maflcko
Copy link
Member Author

maflcko commented Feb 6, 2018

Changed milestone to 0.16

TheBlueMatt added a commit to TheBlueMatt/bitcoin that referenced this pull request Feb 6, 2018
This resolves an issue where getrawmempool() can race mempool
notification signals. Intuitively we use mempool.cs as a "read
lock" on the mempool with cs_main being the write lock, so holding
the read lock intermittently while doing write operations is
somewhat strange.
This also avoids the introduction of cs_main in getrawmempool()
which reviewers objected to in the previous fix in bitcoin#12273
@maflcko
Copy link
Member Author

maflcko commented Feb 6, 2018

There has been too much discussion around this simple fix.

Closing in favor of #12368

@maflcko maflcko closed this Feb 6, 2018
@maflcko maflcko deleted the Mf1801-rpcMempoolGetLock branch February 6, 2018 18:55
laanwj added a commit that referenced this pull request Feb 8, 2018
02fc886 Add braces to meet code style on line-after-the-one-changed. (Matt Corallo)
85aa839 Hold mempool.cs for the duration of ATMP. (Matt Corallo)

Pull request description:

  This resolves an issue where getrawmempool() can race mempool
  notification signals. Intuitively we use mempool.cs as a "read
  lock" on the mempool with cs_main being the write lock, so holding
  the read lock intermittently while doing write operations is
  somewhat strange.

  This also avoids the introduction of cs_main in getrawmempool()
  which reviewers objected to in the previous fix in #12273

Tree-SHA512: 29464b9ca3890010ae13b7dc1c53487cc2bc9c3cf3d32a14cb09c8aa33848f57959d8991ea096beebcfb72f062e4e1962f104aefe4252c7db87633bbfe4ab317
laanwj pushed a commit that referenced this pull request Feb 8, 2018
This resolves an issue where getrawmempool() can race mempool
notification signals. Intuitively we use mempool.cs as a "read
lock" on the mempool with cs_main being the write lock, so holding
the read lock intermittently while doing write operations is
somewhat strange.
This also avoids the introduction of cs_main in getrawmempool()
which reviewers objected to in the previous fix in #12273

Github-Pull: #12368
Rebased-From: 85aa839
Tree-SHA512: 90a505a96cecc065e8575d816f3bb35040df8672efc315f45eb3f2ea086e8ea6ee2c99eed03d0fe2215c8d3ee947a7b120e3c57a25185d03550c9075573ab032
hkjn pushed a commit to hkjn/bitcoin that referenced this pull request Feb 12, 2018
This resolves an issue where getrawmempool() can race mempool
notification signals. Intuitively we use mempool.cs as a "read
lock" on the mempool with cs_main being the write lock, so holding
the read lock intermittently while doing write operations is
somewhat strange.
This also avoids the introduction of cs_main in getrawmempool()
which reviewers objected to in the previous fix in bitcoin#12273
HashUnlimited pushed a commit to chaincoin/chaincoin that referenced this pull request Mar 16, 2018
This resolves an issue where getrawmempool() can race mempool
notification signals. Intuitively we use mempool.cs as a "read
lock" on the mempool with cs_main being the write lock, so holding
the read lock intermittently while doing write operations is
somewhat strange.
This also avoids the introduction of cs_main in getrawmempool()
which reviewers objected to in the previous fix in bitcoin#12273

Github-Pull: bitcoin#12368
Rebased-From: 85aa839
Tree-SHA512: 90a505a96cecc065e8575d816f3bb35040df8672efc315f45eb3f2ea086e8ea6ee2c99eed03d0fe2215c8d3ee947a7b120e3c57a25185d03550c9075573ab032
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Mar 15, 2020
02fc886 Add braces to meet code style on line-after-the-one-changed. (Matt Corallo)
85aa839 Hold mempool.cs for the duration of ATMP. (Matt Corallo)

Pull request description:

  This resolves an issue where getrawmempool() can race mempool
  notification signals. Intuitively we use mempool.cs as a "read
  lock" on the mempool with cs_main being the write lock, so holding
  the read lock intermittently while doing write operations is
  somewhat strange.

  This also avoids the introduction of cs_main in getrawmempool()
  which reviewers objected to in the previous fix in bitcoin#12273

Tree-SHA512: 29464b9ca3890010ae13b7dc1c53487cc2bc9c3cf3d32a14cb09c8aa33848f57959d8991ea096beebcfb72f062e4e1962f104aefe4252c7db87633bbfe4ab317
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants