Skip to content

Conversation

@nanlour
Copy link
Contributor

@nanlour nanlour commented Apr 1, 2024

I think this problem #29767 (comment) is because of
in BaseIndex::Sync

bitcoin/src/index/base.cpp

Lines 163 to 168 in 61de64d

if (!pindex_next) {
SetBestBlockIndex(pindex);
m_synced = true;
// No need to handle errors in Commit. See rationale above.
Commit();
break;

Setup m_synced = true; before Commit();
So this may cause a race condition window to BaseIndex::BlockConnected

bitcoin/src/index/base.cpp

Lines 271 to 274 in 61de64d

// Ignore BlockConnected signals until we have fully indexed the chain.
if (!m_synced) {
return;
}

So i try to fix it with move m_synced = true after Commit().
Also see comment of Sync():

bitcoin/src/index/base.h

Lines 151 to 156 in 61de64d

/// Sync the index with the block index starting from the current best block.
/// Intended to be run in its own thread, m_thread_sync, and can be
/// interrupted with m_interrupt. Once the index gets in sync, the m_synced
/// flag is set and the BlockConnected ValidationInterface callback takes
/// over and the sync thread exits.
void Sync();

I am a newcomer interested in Bitcoin, trying to become a member of the Bitcoin Core development team. Please give me some feedback if you could, as I may be doing something wrong. Thank you!

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 1, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ryanofsky, fjahr
Concept ACK mzumsande

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@fanquake fanquake changed the title ThreadSanitizer: Fix #29767, ThreadSanitizer: data race src/flatfile.cpp:47:13 in FlatFileSeq::Open(FlatFilePos const&, bool) ThreadSanitizer: Fix #29767 Apr 1, 2024
Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setup m_synced = true; before Commit();
So this may cause a race condition window to BaseIndex::BlockConnected

If that's the case, sleeping the thread after setting m_synced and before Commit() should trigger the failure. Have you tried it?

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Seems like this is not just a theoretical ThreadSanitizer error but a bug that could lead to index corruption in the case of unfortunate timing between the threads running Sync() and BlockConnected().

@Sjors
Copy link
Member

Sjors commented Apr 2, 2024

The change to bbe82c1 in index/base.cpp looks safe to me.

cc @fjahr

@nanlour
Copy link
Contributor Author

nanlour commented Apr 2, 2024

@furszy

If that's the case, sleeping the thread after setting m_synced and before Commit() should trigger the failure. Have you tried it?

Yes, I have tried to recreate this bug with the original feature_assumeutxo test from #29767, but simply sleeping between setting m_synced and Commit() cannot trigger it. I've attempted to recreate the data race by only pause Sync thread after it set m_synced and before it call FlatFileSeq::Open, this trigger data race when run feature_assumeutxo test with a probability of around 10%.
This is my way to recreate it: 5931fb8. I am pretty sure there will be a better solution, but I can only come up with this one for now.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK bbe82c1

It seems like the race condition between m_synced = true and committing has been around since 4368384 from #14121, and the race condition accessing the m_next_filter_pos variable existed then too.

@DrahtBot DrahtBot requested a review from mzumsande April 2, 2024 16:16
@fjahr
Copy link
Contributor

fjahr commented Apr 3, 2024

Code review ACK bbe82c1

This looks like the correct fix to me. CI failure is unrelated.

@ryanofsky ryanofsky merged commit 5a5ab1d into bitcoin:master Apr 4, 2024
@ryanofsky
Copy link
Contributor

Merged this since it is a small straightfoward fix, only affects indexing, has 2 acks (ryanofsky, fjahr) and positive comments from other reviewers (mzumsande, sjors, furszy)

Pttn added a commit to RiecoinTeam/Riecoin that referenced this pull request Apr 13, 2024
@luke-jr
Copy link
Member

luke-jr commented Apr 21, 2024

It seems like the race condition between m_synced = true and committing has been around since 4368384 from #14121, and the race condition accessing the m_next_filter_pos variable existed then too.

Until #28955, cs_main was held across all this to prevent a race.

Not going to look at it right now, but I suspect this fix just creates a different race instead.

@luke-jr
Copy link
Member

luke-jr commented Apr 21, 2024

Looks like furszy already got this in #29867

@nanlour
Copy link
Contributor Author

nanlour commented Apr 26, 2024

Not going to look at it right now, but I suspect this fix just creates a different race instead.
Looks like furszy already got this in #29867

I think the race condition fixed in #29867 was not introduced by my change; it was created by #28955, although my change makes the race condition window bigger.

@ryanofsky
Copy link
Contributor

Until #28955, cs_main was held across all this to prevent a race.

The race existed before #28955 because even before #28955, only the BaseIndex::Sync thread was locking cs_main. The BaseIndex::BlockConnected notification thread was never locking cs_main. So setting m_synced = true; before calling Commit(); was always unsafe, even with cs_main held because the BlockConnected thread could run in parallel and both threads could try to update index state at the same time. #28955 would make the race more likely to happen though, which was probably why it was caught more recently.

fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Apr 26, 2024
@fanquake fanquake mentioned this pull request Apr 26, 2024
@fanquake
Copy link
Member

Backported for 27.x in #29888.

fanquake added a commit that referenced this pull request May 13, 2024
bd5860b [WIP] doc: release notes for 27.x (fanquake)
475aac4 doc: add LLVM instruction for macOS < 13 (Sjors Provoost)
a995902 depends: Fix build of Qt for 32-bit platforms (laanwj)
0fcceef Fix #29767, set m_synced = true after Commit() (nanlour)
ae9a2ed sign: don't assume we are parsing a sane Miniscript (Antoine Poinsot)
a6a59cf rpc: Reword SighashFromStr error message (MarcoFalke)
364bf01 build: Fix false positive `CHECK_ATOMIC` test for clang-15 (Hennadii Stepanov)
9277793 test: Fix failing univalue float test (MarcoFalke)
5c09791 doc: archive 27.0 release notes (fanquake)
897e5af [rpc, bugfix] Enforce maximum value for setmocktime (dergoegge)
602cfd5 ci: Bump s390x to ubuntu:24.04 (MarcoFalke)
20e6e8d Change Luke Dashjr seed to dashjr-list-of-p2p-nodes.us (Luke Dashjr)
a6862c5 depends: fix mingw-w64 Qt DEBUG=1 build (fanquake)

Pull request description:

  Backports:
  * #29691
  * #29747
  * #29776
  * #29853
  * #29856
  * #29859
  * #29869
  * #29870
  * #29886
  * #29892
  * #29934
  * #29985

ACKs for top commit:
  willcl-ark:
    reACK bd5860b
  stickies-v:
    re-ACK bd5860b
  TheCharlatan:
    ACK bd5860b

Tree-SHA512: a1a40de70cf52b5fc01d9dcc772421751a18c6a48a726c4c05c0371c585a53a27902e17daed9e0d721ab7763c94bb32de05c146bd6bc73fd558edd08b31e8547
glozow pushed a commit to glozow/bitcoin that referenced this pull request May 23, 2024
glozow added a commit that referenced this pull request May 24, 2024
aa7e876 [doc] add draft release notes for 26.2rc1 (glozow)
21d9aaa p2p, bugfix: detect addnode cjdns peers in GetAddedNodeInfo() (Jon Atack)
ec5ce2f windeploy: Renew certificate (Ava Chow)
96d0e81 rpc: Reword SighashFromStr error message (MarcoFalke)
6685aff rpc: move UniValue in blockToJSON (willcl-ark)
7f45e00 depends: Fix build of Qt for 32-bit platforms (laanwj)
f9b76ba ci: Pull in qtbase5-dev instead of seperate low-level libraries (laanwj)
c587753 doc: Suggest installing dev packages for debian/ubuntu qt5 build (laanwj)
7ecdb08 ci: Bump s390x to ubuntu:24.04 (MarcoFalke)
d9ef6cf sign: don't assume we are parsing a sane Miniscript (Antoine Poinsot)
e4859c8 depends: fix mingw-w64 Qt DEBUG=1 build (fanquake)
bb46b90 Fix #29767, set m_synced = true after Commit() (nanlour)
bf5b6fc Throw error if invalid parameters passed to getnetworkhashps RPC endpoint (Jameson Lopp)
a81a922 [rpc, bugfix] Enforce maximum value for setmocktime (dergoegge)
d39ea51 Change Luke Dashjr seed to dashjr-list-of-p2p-nodes.us (Luke Dashjr)
c21bbcc [doc] archive 26.1 release notes (glozow)

Pull request description:

  Archives 26.1 release notes and adds draft release notes for 26.2rc1

  Also backports:
  - #29691
  - #29869
  - #28554
  - #29747
  - #29853
  - #29856
  - #29764
  - #29776
  - #29985
  - #30094
  - #29870
  - #30149
  - #30085

ACKs for top commit:
  stickies-v:
    re-ACK aa7e876, only changes are fixing commit msg and transifex reference
  willcl-ark:
    ACK aa7e876

Tree-SHA512: b81ba6092640de696d782114cdf43e7ed1d63ea0a3231cade30653c2743d87700e0f852a1b1fcc42ae313b2d4f004e6026ddbad87d58c2fde0a660e90026ed98
tomt1664 pushed a commit to tomt1664/elements that referenced this pull request Apr 1, 2025
bbe82c1 Fix #29767, set m_synced = true after Commit() (nanlour)

Pull request description:

  I think this problem bitcoin/bitcoin#29767 (comment) is because of
  in BaseIndex::Sync
  https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.cpp#L163-L168
  Setup m_synced = true; before Commit();
  So this may cause a race condition window to BaseIndex::BlockConnected
  https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.cpp#L271-L274
  So i try to fix it with move m_synced = true after Commit().
  Also see comment of Sync():
  https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.h#L151-L156
  I am a newcomer interested in Bitcoin, trying to become a member of the Bitcoin Core development team. Please give me some feedback if you could, as I may be doing something wrong. Thank you!

ACKs for top commit:
  fjahr:
    Code review ACK bbe82c1
  ryanofsky:
    Code review ACK bbe82c1

Tree-SHA512: 89a09498a232c87ef1e083d4cc4ed9bb15f045ad0624d5d150a87187b2b8a48a41137974dbc7ea5c37f73da90742c43259f5aa7f84b4179eb8d62033e44fa479
tomt1664 pushed a commit to tomt1664/elements that referenced this pull request Apr 21, 2025
bbe82c1 Fix #29767, set m_synced = true after Commit() (nanlour)

Pull request description:

  I think this problem bitcoin/bitcoin#29767 (comment) is because of
  in BaseIndex::Sync
  https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.cpp#L163-L168
  Setup m_synced = true; before Commit();
  So this may cause a race condition window to BaseIndex::BlockConnected
  https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.cpp#L271-L274
  So i try to fix it with move m_synced = true after Commit().
  Also see comment of Sync():
  https://github.com/bitcoin/bitcoin/blob/61de64df6790077857faba84796bb874b59c5d15/src/index/base.h#L151-L156
  I am a newcomer interested in Bitcoin, trying to become a member of the Bitcoin Core development team. Please give me some feedback if you could, as I may be doing something wrong. Thank you!

ACKs for top commit:
  fjahr:
    Code review ACK bbe82c1
  ryanofsky:
    Code review ACK bbe82c1

Tree-SHA512: 89a09498a232c87ef1e083d4cc4ed9bb15f045ad0624d5d150a87187b2b8a48a41137974dbc7ea5c37f73da90742c43259f5aa7f84b4179eb8d62033e44fa479
@bitcoin bitcoin locked and limited conversation to collaborators Apr 26, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants