-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix locks in db.reloadBlocks() #12920
Conversation
But that only confirms that I did check that every function calling I think we should remove What do you think @jesusvazquez ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, from your explanation and my understanding of the code
c5c3e70
to
1b8cf6a
Compare
I think I've updated all places that read |
1b8cf6a
to
2a22799
Compare
Are we ok just removing the test? I read through the comments and it sounds like you hoped to find a way to test the race. |
Im not sure if that’s a question for me or some reviewer, but I do believe that this test is not valid and should be removed (as per my previous comment). |
Hello from the bug-scrub! It looks like this was approved and ready to go, but now it has a small merge conflict. I'll also link #14610 which is touching a similar area. |
There are no approvals here at the moment and the last comment was a bit vague so I'm not sure if it was for me or someone else to answer. |
Approvals get removed when a change is pushed. |
rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see the functions involved in this PR all document whether they need certain lock(s) on entry. Like is already done in a few places saying // The compaction mutex should be held before calling this method.
(although I would spell that db.cmtx
.
Do you know the lock hierarchy between DB.mtx
, DB.cmtx
and DB.autoCompactMtx
? I.e. which ones you must lock first, if you're going to lock more than one of them?
This partially reverts ae3d392. ae3d392 added a call to db.mtx.Lock() that lasts for the entire duration of db.reloadBlocks(), previous db.mtx would be locked only during critical part of db.reloadBlocks(). The motivation was to protect against races: prometheus@9e0351e#r555699794 The 'reloads' being mentioned are (I think) reloadBlocks() calls, rather than db.reload() or other methods. TestTombstoneCleanRetentionLimitsRace was added to catch this but I wasn't able to ever get any error out of it, even after disabling all calls to db.mtx in reloadBlocks() and CleanTombstones(). To make things more complicated CleanupTombstones() itself calls reloadBlocks(), so it seems that the real issue is that we might have concurrent calls to reloadBlocks(). The problem with this change is that db.reloadBlocks() can take a very long time, that's because it might need to load very large blocks from disk, which is slow. While db.mtx is locked a large chunk of the db is locked, including queries, since db.mtx read lock is needed for db.Querier() call. One of the issues this manifests itself as is a gap in all metrics and blocked queries just after a large block compaction happens. When compaction merges multiple day-or-more blocks into a week-or-more block it create a single very big block. After that block is written it needs to be loaded and that seems to be taking many seconds (30-45), during which mtx is held and everything is blocked. Turns out that there is another lock that is more fine grained and aimed at this specific use case: // cmtx ensures that compactions and deletions don't run simultaneously. cmtx sync.Mutex All calls to reloadBlocks() are wrapped inside cmtx lock. The only exception is db.reload() which this change fixes. We can't add cmtx lock inside reloadBlocks() itself because it's called by a number of functions, some of which are already holding cmtx. Looking at the code I think it is sufficient to hold cmtx and skip a reloadBlocks() wide mtx call. Signed-off-by: Łukasz Mierzwa <[email protected]>
This test ensures that running db.reloadBlocks() and db.CleanTombstones() at the same time doesn't race. The problem is that CleanTombstones() is a public method while reloadBlocks() is internal. CleanTombstones() sets db.cmtx lock while reloadBlocks() is not protected by any locks at all, it expects the public method through which it was called to do it. So having a race between these two is not unexpected and we shouldn't really be testing this. db.cmtx ensures that no other function can be modifying the list of open blocks and so the scenario tested here cannot happen. If it would happen it would be only because some other method doesn't aquire db.ctmx lock, something this test cannot detect. Signed-off-by: Łukasz Mierzwa <[email protected]>
We don't hold db.mtx lock when trying to read db.blocks here so we need a read lock around this loop. Signed-off-by: Łukasz Mierzwa <[email protected]>
Compact() is an uppercase function that deals with locks on its own, so we shouldn't have a lock around it. Signed-off-by: Lukasz Mierzwa <[email protected]>
Signed-off-by: Lukasz Mierzwa <[email protected]>
I've added it where missing, plus changed
I do not know that, sorry. All I know is what I can see in the code and on ae3d392. |
Test failure seems unrelated:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for delay; I think we can go ahead with this.
This partially reverts ae3d392.
ae3d392 added a call to db.mtx.Lock() that lasts for the entire duration of db.reloadBlocks(), previous db.mtx would be locked only during critical part of db.reloadBlocks(). The motivation was to protect against races:
9e0351e#r555699794 The 'reloads' being mentioned are (I think) reloadBlocks() calls, rather than db.reload() or other methods. TestTombstoneCleanRetentionLimitsRace was added to catch this but I wasn't able to ever get any error out of it, even after disabling all calls to db.mtx in reloadBlocks() and CleanTombstones(). To make things more complicated CleanupTombstones() itself calls reloadBlocks(), so it seems that the real issue is that we might have concurrent calls to reloadBlocks().
The problem with this change is that db.reloadBlocks() can take a very long time, that's because it might need to load very large blocks from disk, which is slow. While db.mtx is locked a large chunk of the db is locked, including queries, since db.mtx read lock is needed for db.Querier() call. One of the issues this manifests itself as is a gap in all metrics and blocked queries just after a large block compaction happens. When compaction merges multiple day-or-more blocks into a week-or-more block it create a single very big block. After that block is written it needs to be loaded and that seems to be taking many seconds (30-45), during which mtx is held and everything is blocked.
Turns out that there is another lock that is more fine grained and aimed at this specific use case:
// cmtx ensures that compactions and deletions don't run simultaneously. cmtx sync.Mutex
All calls to reloadBlocks() are wrapped inside cmtx lock. The only exception is db.reload() which this change fixes. We can't add cmtx lock inside reloadBlocks() itself because it's called by a number of functions, some of which are already holding cmtx.
Looking at the code I think it is sufficient to hold cmtx and skip a reloadBlocks() wide mtx call.