Fix potential races in the volume store#17185
Conversation
|
@cpuguy83 is this fix needed for 1.9? |
|
@thaJeztah I hadn't thought about it... wasn't planning on adding 300 LOC to an RC... ping @coolljt0725 who ran into this while using convoy. |
|
Well, I hope not (indeed quite some loc), but /cc @tiborvass as release captain, just in case @cpuguy83 should this be two commits? one for adding the locker pkg, and one for updating the volumes code? |
|
@cpuguy83 My only concern is that a |
|
@coolljt0725 We really can't return early unless we do a timeout or something. |
pkg/locker/locker.go
Outdated
There was a problem hiding this comment.
maybe typo here, waitings wating should be waiters waiting like above?
|
Looks like this needs a rebase 😢 |
c167b57 to
ae69ec8
Compare
|
rebased |
|
LGTM |
pkg/locker/README.md
Outdated
There was a problem hiding this comment.
weird spacing for go code I think
There was a problem hiding this comment.
ah yeah, let me be lazy and write this in a .go file first so gofmt gets run :)
a05d2c0 to
8db657f
Compare
pkg/locker/locker.go
Outdated
There was a problem hiding this comment.
As I see this can be called concurrently to lock.Lock() and it'll be possible ho hold separate locks for same name when lock will be deleted from map before calling inc.
Probably we need something simpler.
There was a problem hiding this comment.
Since this is called in the locker main lock, this cannot be called in parallel from outside of this package.
There was a problem hiding this comment.
Actually, what it probably needs to do is call inc inside the main lock instead of in the lockctr itself.
There was a problem hiding this comment.
diff --git a/pkg/locker/locker.go b/pkg/locker/locker.go
index b62bafe..ec2d6da 100644
--- a/pkg/locker/locker.go
+++ b/pkg/locker/locker.go
@@ -52,9 +52,7 @@ func (l *lockCtr) count() uint32 {
// Lock locks the mutex
func (l *lockCtr) Lock() {
- l.inc()
l.mu.Lock()
- l.dec()
}
// Unlock unlocks the mutex
@@ -75,29 +73,33 @@ func (l *Locker) Lock(name string) {
if l.locks == nil {
l.locks = make(map[string]*lockCtr)
}
- lock, exists := l.locks[name]
+
+ nameLock, exists := l.locks[name]
if !exists {
- lock = &lockCtr{}
- l.locks[name] = lock
+ nameLock = &lockCtr{}
+ l.locks[name] = nameLock
}
+
+ nameLock.inc()
l.mu.Unlock()
- lock.Lock()
+ nameLock.Lock()
+ nameLock.dec()
}
// Unlock unlocks the mutex with the given name
// If the given lock is not being waited on by any other callers, it is deleted
func (l *Locker) Unlock(name string) error {
l.mu.Lock()
- lock, exists := l.locks[name]
+ nameLock, exists := l.locks[name]
if !exists {
l.mu.Unlock()
return ErrNoSuchLock
}
- if lock.count() == 0 {
+ if nameLock.count() == 0 {
delete(l.locks, name)
}
- lock.Unlock()
+ nameLock.Unlock()
l.mu.Unlock()
return nilUses finer grained locking so that each volume name gets its own lock rather than only being protected by the global lock, which itself needs to be unlocked during cetain operations (`create` especially`) Signed-off-by: Brian Goff <[email protected]>
8db657f to
fe66fdd
Compare
|
@LK4D4 Updated, I think this will take care of the issue. PTAL. |
There was a problem hiding this comment.
Theoretically nothing wrong for your usecase, but it's possible that you will return this error even for lock which you Locked yourself when dec will race with Unlock.
There was a problem hiding this comment.
Like you dec in Lock and at same time other goroutine Unlock and remove lock from map.
There was a problem hiding this comment.
Having a hard time seeing this one.
Worst case I can see is this it might leave an extra/unused lock in the map, but I can't see where it would get this error in something we've locked.
There was a problem hiding this comment.
Yeah, looking at this again, I don't see it :( maybe you can explain on IRC?
There was a problem hiding this comment.
So, you got Lock, now inc() is zero. You started Unlock, at same time another goroutine did Lock(), you removed lock from map, another goroutine will have error on Unlock. I think it's better to exclude error from interface maybe.
There was a problem hiding this comment.
But Lock/Unlock are protected by a mutex.
If someone calls Unlock on something that it doesn't actually own the lock for, I think that's a valid error (and would generate a panic under a normal sync/mutex).
There was a problem hiding this comment.
dec() is not protected. Lock() -> Unlock() + Lock{dec()} -> Unlock() with error for valid Lock().
There was a problem hiding this comment.
The 2nd lock request wouldn't be at dec() because at this point the nameLock is still locked, so it's still hanging on L90 waiting to lock.
The count/delete is done while still holding nameLock, and if the nameLock isn't being held, then it really is a problem with how the caller is using it... but I suppose we can sitll protect against this by ditching the atomic counter and just using a 2nd mutex for synchronizing access here... wdyt?
There was a problem hiding this comment.
No, I think it's ok. Sorry to waste your time for explaining :)
|
Also ping @stevvooe since I think you implemented something similar (though w/o the cleanup) in graph. |
|
LGTM |
…store Fix potential races in the volume store
|
@cpuguy83 @LK4D4 Indeed. This is very similar to https://github.com/docker/docker/blob/master/graph/mutex.go except it manages the reference counts. Might be good to convert graph to use this. |
|
cool, merged! thanks! |
Uses finer grained locking so that each volume name gets its own lock
rather than only being protected by the global lock, which itself needs
to be unlocked during cetain operations (
createespecially`)Extracts locking package/store changes for locking out from #16534
Closes #16473