-
Notifications
You must be signed in to change notification settings - Fork 38.7k
wallet: fix deadlock in bdb read write operation #27556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
theStack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look closer at the changes yet, but can confirm that running the newly introduced unit test via
$ ./src/test/test_bitcoin --log_level=all --run_test=walletdb_tests/walletdb_read_write_deadlock
succeeds on the PR branch head and hangs if the bugfix commit bda562ddba13ee8f94147734ab9901ab07357f2f is reverted.
|
ACK d3419f4108669d37a4ccecab77750e70c2f6237c |
|
Backport shouldn't be required as I don't think there's any user action that can trigger the deadlock. |
d3419f4 to
de202f8
Compare
|
Updated per feedback. Thanks achow101. Only diff is |
|
ACK de202f860b3f79774d5235dec59a140fa5c82fa7 |
|
Concept ACK. |
|
Does this PR makes bitcoin/test/sanitizer_suppressions/tsan Line 28 in 137a98c
|
hebasto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK de202f860b3f79774d5235dec59a140fa5c82fa7.
@hebasto hmm, interesting. It seems to be outdated in master. Just ran the test suite there, without the suppression, and it passed. |
hebasto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK de202f860b3f79774d5235dec59a140fa5c82fa7, tested on Ubutnu 23.04.
To replicate it, need bdb support and drop the first commit. The test will run forever.
The test is deadlocked with the second commit dropped as well.
…yCursor If the batch ptr is not passed, the cursor will not use the db active txn context which could lead to a deadlock if the code tries to modify the db while it is traversing it. E.g. the 'EraseRecords()' function.
so we erase all the records atomically or abort the entire procedure. and, at the same time, we can share the same db txn context for the db cursor and the erase functionality. extra note from the Db.cursor doc: "If transaction protection is enabled, cursors must be opened and closed within the context of a transaction" thus why added a `CloseCursor` call before calling to `TxnAbort/TxnCommit`.
de202f8 to
69d4390
Compare
|
rebased, conflicts solved. |
hebasto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 69d4390
|
ACK 69d4390 |
69d4390 test: add coverage for wallet read write db deadlock (furszy) 12daf6f walletdb: scope bdb::EraseRecords under a single db txn (furszy) 043fcb0 wallet: bugfix, GetNewCursor() misses to provide batch ptr to BerkeleyCursor (furszy) Pull request description: Decoupled from bitcoin#26644 so it can closed in favor of bitcoin#26715. Basically, with bdb, we can't make a write operation while we are traversing the db with the same db handler. These two operations are performed in different txn contexts and cause a deadlock. Added coverage by using `EraseRecords()` which is the simplest function that executes this process. To replicate it, need bdb support and drop the first commit. The test will run forever. ACKs for top commit: achow101: ACK 69d4390 hebasto: re-ACK 69d4390 Tree-SHA512: b3773be78925f674e962f4a5c54b398a9d0cfe697148c01c3ec0d68281cc5c1444b38165960d219ef3cf1a57c8ce6427f44a876275958d49bbc0808486e19d7d
Basically, with bdb, we can't make a write operation while we are traversing the db with the same db handler. These two operations are performed in different txn contexts and cause a deadlock.
Added coverage by using
EraseRecords()which is the simplest function that executes this process.To replicate it, need bdb support and drop the first commit. The test will run forever.