Skip to content

Conversation

@jonatack
Copy link
Member

Addresses review feedback by Marco Falke in #22932 (comment):

  1. add cs_main thread safety annotation to CBlockTreeDB::WriteBatchSync(), which allows us to

  2. remove the cs_main lock in CDiskBlockIndex::SERIALIZE_METHODS with respect to its 2 callers in the codebase, CBlockTreeDB::LoadBlockIndexGuts() and CBlockTreeDB::WriteBatchSync(), that already hold the lock and whose callers hold the lock.

I didn't manage to remove the thread safety warnings generated by the fuzz tests (suggestions welcome) and resolved them here with a NO_THREAD_SAFETY_ANALYSIS annotation and explanatory documentation.

This allows us to remove the cs_main lock in CDiskBlockIndex::SERIALIZE_METHODS
with the remaining thread safety warnings generated only by the tests.
@luke-jr
Copy link
Member

luke-jr commented Jan 31, 2022

Concept NACK. Prefer recursive locks over fragile assumptions about the rest of the codebase.

@maflcko
Copy link
Member

maflcko commented Jan 31, 2022

Concept NACK. Prefer recursive locks over fragile assumptions about the rest of the codebase.

My point was that the lock isn't needed to be recursively taken during serialization since the data is copied before serialization.

@luke-jr
Copy link
Member

luke-jr commented Jan 31, 2022

Specifically, I don't think we should need NO_THREAD_SAFETY_ANALYSIS - surely there's a better way?

@jonatack
Copy link
Member Author

I agree but didn't see how to update the fuzz tests to do it -- happy to update with suggestions.

@vasild
Copy link
Contributor

vasild commented Mar 7, 2022

To me it looks like that it is not just the tests that would be problematic:

  1. This CDBBatch::Write() call would internally invoke the CDiskBlockIndex serialization method:

    batch.Write(std::make_pair(DB_BLOCK_INDEX, (*it)->GetBlockHash()), CDiskBlockIndex(*it));

  2. This CDBIterator::GetValue() call would internally invoke the CDiskBlockIndex deserialization method:

    if (pcursor->GetValue(diskindex)) {

Both Write() and GetValue() are too generic and we can't tag them as requiring cs_main or acquire cs_main from within them.

So, we have some methods in the (non-test) code that invoke CDiskBlockIndex ser/deser and can't acquire cs_main or require their callers to acquire it.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 0e052eb

re: #24199 (comment)

To me it looks like that it is not just the tests that would be problematic [...]

This is all true, CDiskBlockIndex flow is a little convoluted. I think following my suggestion below to make relevant CBlockIndex member variables private & nonguarded, you could keep the CDiskBlockIndex class working by declaring it as a friend of CBlockIndex, or by changing it to use SER_READ / SER_WRITE macros with specialized Set/Get accessor methods that don't require cs_main.

// and (b) if the analysis is enabled the nStatus compiler warnings are only
// generated by the tests, as the callers in the codebase that invoke this
// serialization hold cs_main and have lock annotations and assertions.
SERIALIZE_METHODS(CDiskBlockIndex, obj) NO_THREAD_SAFETY_ANALYSIS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Drop lock and thread safety analysis in CDiskBlockIndex serialization" (0e052eb)

It'd be nice to do this without disabling thread safety checks. Though I can see it would require more work, and maybe better to do separately or as a followup. (IMO ideally we would have a linter to make sure every NO_THREAD_SAFETY_ANALYSIS usage was accompanied by a link to a github issue where we could discuss the plan to reenable thread safety checks.)

In this case it looks like most obvious way of removing NO_THREAD_SAFETY_ANALYSIS would be to replace public guarded variables with private unguarded variables, changing:

public:
    int nFile GUARDED_BY(::cs_main){0};
    unsigned int nDataPos GUARDED_BY(::cs_main){0};
    unsigned int nUndoPos GUARDED_BY(::cs_main){0};
    uint32_t nStatus GUARDED_BY(::cs_main){0};

to:

private:
    int m_file_num = 0;
    unsigned int m_data_pos = 0;
    unsigned int m_undo_pos = 0;
    uint32_t m_status = 0;

And add EXCLUSIVE_LOCKS_REQUIRED accessor methods as needed to ensure variables are always accessed with the right locks held.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how would that be different than the current code?

current: LOCK(cs_main); read nStatus
proposed: LOCK(cs_main); call GetNStatus()

and the ser/deser methods would still need to acquire cs_main, no?

The access to all nStatus/nDataPos/nUndoPos from within ser/deser has to happen in one atomic block. I mean that this is not ok:

lock
read nStatus
unlock
lock
read nDataPos
unlock

it has to be

lock
read nStatus
read nDataPos
unlock

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CDiskBlockIndex are short-lived copied data on the stack, shared with no other thread, so they don't need any locks. If there is a missing lock then it can at most be missed while copying the data.

Copy link
Contributor

@vasild vasild Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanofsky, now I see your point - to change the members of the parent class CBlockIndex to be not-tagged as guarded by cs_main and change them to private, in the hopes that the methods that touch them respect the implicit guarding. Then, access them without guarding from the child class CDiskBlockIndex. I do not like this much because the members in CBlockIndex need to be guarded, but will be without a GUARDED_BY() tag. Hmm...

@MarcoFalke, I did not realize this before, thanks for pointing it out!

If there is a missing lock then it can at most be missed while copying the data

there is not - both users of CDiskBlockIndex own cs_main when creating such objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoFalke, I did not realize this before, thanks for pointing it out!

I pointed it only out thrice up to now 😅 . #24199 (comment) and #22932 (comment)

Copy link
Contributor

@ryanofsky ryanofsky Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like this much because the members in CBlockIndex need to be guarded, but will be without a GUARDED_BY() tag.

I should probably explain reasoning behind what I suggested yesterday. I don't think this is a question what anybody likes or doesn't like, but a question of what's right and wrong. It's wrong to annotate these struct members as being guarded by cs_main, because they aren't guarded by it. The whole struct is not even guarded by cs_main, just certain instances of it are. What is guarded by cs_main is the m_block_index map and pointers & references to CBlockIndex instances specifically inside that map.

The correct way to annotate this code would be not to annotate anything in CBlockIndex (Ideally I would rename it to struct BlockInfo and make it plain struct with no methods), but to not expose this low-level struct to high level application code. Higher level code would use a BlockReference accessor class that has a BlockInfo& member and methods appropriately annotated with EXCLUSIVE_LOCKS_REQUIRED(cs_main) to enforce the access pattern which enables our lock usage. CBlockIndex* pointers throughout the code would be replaced by BlockReference objects. In my ideal world, this would look something like:

struct BlockInfo {
   uint256 hash;
   int height = 0;
   BlockInfo* prev = nullptr;
   /// ... more members here ... ///
};

struct BlockState : public BlockInfo {
  int flags = 0;
  int file_num = -1;
  unsigned data_pos = 0;
  unsigned undo_pos = 0;
};

class BlockReference {
private:
   BlockState& m_block;
public:
   BlockInfo& info() const { return m_block; }
   BlockState& state() EXCLUSIVE_LOCKS_REQUIRED(cs_main) { return m_block; }
   BlockReference prev() const { return {static_cast<BlockState&>(*Assert(m_block.prev))}; }
   // Could add more fine-grained accessors like height() or flags() or whatever.
};

class BlockManager {
private:
  std::unordered_set<BlockState> m_blocks GUARDED_BY(cs_main);
public:
  BlockReference FindBlock(const uint256& hash) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
   /// ... more stuff ... ///
}

The appoach I suggested yesterday does the same thing but keeps everything mushed into the same CBlockIndex class/struct hybrid instead of splitting it out. It would be pretty easy to implement what I suggested yesterday as a followup to this PR to re-enable thread safety checking. In any case I was not suggesting doing something bigger like that in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to use another wording (not like/dislike):

It is a design smell to have a variable without GUARDED_BY() and rely on EXCLUSIVE_LOCKS_REQUIRED() on some methods to protect it. This way, if the code goes wrong (like, new access is added to the variable from a function not marked with EXCLUSIVE_LOCKS_REQUIRED()) then the compiler has no way to detect that. This defeats the purpose of thread safety annotations which is to enforce proper access and detect mistakes if the code goes wrong.

Your suggestion above does that:

replace public guarded variables with private unguarded variables ... And add EXCLUSIVE_LOCKS_REQUIRED accessor methods as needed to ensure variables are always accessed with the right locks held.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review feedback! Given the number of pulls that are currently touching this area of the codebase, it seems good to keep this change minimal and perhaps propose these ideas as a follow-up (happy to review them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a design smell to have a variable without GUARDED_BY() and rely on EXCLUSIVE_LOCKS_REQUIRED() on some methods to protect it. This way, if the code goes wrong (like, new access is added to the variable from a function not marked with EXCLUSIVE_LOCKS_REQUIRED()) then the compiler has no way to detect that. This defeats the purpose of thread safety annotations which is to enforce proper access and detect mistakes if the code goes wrong.

These are mostly incorrect claims made without specific reasoning or justification. If you have an inner variable that does not require lucking, and an outer data structure containing the variable, or an outer pointer pointing to the variable, or an outer function accessing the variable, and the outer container/pointer/function does require locking, the correct way of annotating the code is to not annotate the variable, and to annotate the outer container/pointer/function.

If you look at the actual code I posted in #24199 (comment), you can see that the compiler will detect all unsafe accesses. If you want to talk about a code smell, NO_THREAD_SAFETY_ANALYSIS is the code smell, and the design in that post shows how to get rid of it.

That said, there is a difference between the bigger change I posted yesterday #24199 (comment), and the smaller suggestion #24199 (comment) I posted two days ago. In both cases the compiler can guarantee thread safety outside of chain.h/chain.cpp if these files are annotated correctly. But in the bigger change where I split up CBlockIndex class into separate BlockInfo / BlockState / BlockReference components, it's easy to annotate everything correctly, while in the smaller change where CBlockIndex is kept as a monolithic class, what you are saying about it being trickier to see notice missing EXCLUSIVE_LOCKS_REQUIRED annotations on CBlockIndex methods is true.

If you want to argue that using incorrect annotations is better than using correct ones for practical reasons (e.g. it is better to use NO_THREAD_SAFETY_ANALYSIS in one method, so the rest of the file is easier to maintain), that's fine. But let's not create confusion about what correct annotations are and what incorrect annotations are. Correct annotations do not require using NO_THREAD_SAFETY_ANALYSIS, and that's just the most obvious clue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrt NO_THREAD_SAFETY_ANALYSIS I tend to agree with @luke-jr: #24199 (comment). This is why I did not ACK this PR, which contains NO_THREAD_SAFETY_ANALYSIS. Personally, I do not see a good and elegant solution to the problem this PR aims to fix (yet).

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 17, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ryanofsky
Concept NACK luke-jr

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25862 (refactor, kernel: Remove gArgs accesses from dbwrapper and txdb by ryanofsky)
  • #19463 (Prune locks by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@achow101
Copy link
Member

Are you still working on this?

@achow101
Copy link
Member

Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened.

@achow101 achow101 closed this Nov 10, 2022
@jonatack
Copy link
Member Author

Are you still working on this?

Yes, sorry not to have seen this comment and replied. There is valuable review feedback here so it may be helpful to re-open here rather than open a new pull (mentioning as I do not have the privileges to re-open it.)

@maflcko maflcko reopened this Nov 10, 2022
@DrahtBot DrahtBot mentioned this pull request Nov 16, 2022
@fanquake fanquake marked this pull request as draft February 6, 2023 15:02
@fanquake
Copy link
Member

fanquake commented Feb 6, 2023

Moved this to draft. It's not clear what the status of this is. There is clearly disagreement about the approach amongst contributors. Was re-opened to address feedback, but no activity in the 4 months since.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

@jonatack
Copy link
Member Author

Closing temporarily so that I can re-open it -- am still interested to work on this.

@jonatack jonatack closed this Apr 25, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Apr 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants