Skip to content

Fix potential races in the volume store#17185

Merged
LK4D4 merged 1 commit intomoby:masterfrom
cpuguy83:use_finer_locking_for_volume_store
Nov 6, 2015
Merged

Fix potential races in the volume store#17185
LK4D4 merged 1 commit intomoby:masterfrom
cpuguy83:use_finer_locking_for_volume_store

Conversation

@cpuguy83
Copy link
Member

Uses finer grained locking so that each volume name gets its own lock
rather than only being protected by the global lock, which itself needs
to be unlocked during cetain operations (create especially`)

Extracts locking package/store changes for locking out from #16534

Closes #16473

@thaJeztah
Copy link
Member

@cpuguy83 is this fix needed for 1.9?

@cpuguy83
Copy link
Member Author

@thaJeztah I hadn't thought about it... wasn't planning on adding 300 LOC to an RC...
In automated systems this could cause a problem.

ping @coolljt0725 who ran into this while using convoy.

@thaJeztah
Copy link
Member

Well, I hope not (indeed quite some loc), but

/cc @tiborvass as release captain, just in case

@cpuguy83 should this be two commits? one for adding the locker pkg, and one for updating the volumes code?

@coolljt0725
Copy link
Contributor

@cpuguy83 My only concern is that a name is lock by a docker volume create and at the same time another docker volume create with the same name will block until the previous docker volume create return. Should we return immediately if a name is locked by another docker volume create and throw a message like this A volume name xxxx is being created by volume driver xxxx?

@cpuguy83
Copy link
Member Author

@coolljt0725 We really can't return early unless we do a timeout or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe typo here, waitings wating should be waiters waiting like above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@thaJeztah
Copy link
Member

Looks like this needs a rebase 😢

@cpuguy83 cpuguy83 force-pushed the use_finer_locking_for_volume_store branch from c167b57 to ae69ec8 Compare October 26, 2015 20:25
@cpuguy83
Copy link
Member Author

rebased

@calavera
Copy link
Contributor

LGTM

@calavera calavera added the status/needs-attention Calls for a collective discussion during a review session label Nov 4, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird spacing for go code I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah, let me be lazy and write this in a .go file first so gofmt gets run :)

@cpuguy83 cpuguy83 force-pushed the use_finer_locking_for_volume_store branch 2 times, most recently from a05d2c0 to 8db657f Compare November 4, 2015 21:09
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see this can be called concurrently to lock.Lock() and it'll be possible ho hold separate locks for same name when lock will be deleted from map before calling inc.
Probably we need something simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is called in the locker main lock, this cannot be called in parallel from outside of this package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what it probably needs to do is call inc inside the main lock instead of in the lockctr itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/pkg/locker/locker.go b/pkg/locker/locker.go
index b62bafe..ec2d6da 100644
--- a/pkg/locker/locker.go
+++ b/pkg/locker/locker.go
@@ -52,9 +52,7 @@ func (l *lockCtr) count() uint32 {

 // Lock locks the mutex
 func (l *lockCtr) Lock() {
-   l.inc()
    l.mu.Lock()
-   l.dec()
 }

 // Unlock unlocks the mutex
@@ -75,29 +73,33 @@ func (l *Locker) Lock(name string) {
    if l.locks == nil {
        l.locks = make(map[string]*lockCtr)
    }
-   lock, exists := l.locks[name]
+
+   nameLock, exists := l.locks[name]
    if !exists {
-       lock = &lockCtr{}
-       l.locks[name] = lock
+       nameLock = &lockCtr{}
+       l.locks[name] = nameLock
    }
+
+   nameLock.inc()
    l.mu.Unlock()
-   lock.Lock()
+   nameLock.Lock()
+   nameLock.dec()
 }

 // Unlock unlocks the mutex with the given name
 // If the given lock is not being waited on by any other callers, it is deleted
 func (l *Locker) Unlock(name string) error {
    l.mu.Lock()
-   lock, exists := l.locks[name]
+   nameLock, exists := l.locks[name]
    if !exists {
        l.mu.Unlock()
        return ErrNoSuchLock
    }

-   if lock.count() == 0 {
+   if nameLock.count() == 0 {
        delete(l.locks, name)
    }
-   lock.Unlock()
+   nameLock.Unlock()

    l.mu.Unlock()
    return nil

Uses finer grained locking so that each volume name gets its own lock
rather than only being protected by the global lock, which itself needs
to be unlocked during cetain operations (`create` especially`)

Signed-off-by: Brian Goff <[email protected]>
@cpuguy83 cpuguy83 force-pushed the use_finer_locking_for_volume_store branch from 8db657f to fe66fdd Compare November 4, 2015 21:59
@cpuguy83
Copy link
Member Author

cpuguy83 commented Nov 4, 2015

@LK4D4 Updated, I think this will take care of the issue. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically nothing wrong for your usecase, but it's possible that you will return this error even for lock which you Locked yourself when dec will race with Unlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you dec in Lock and at same time other goroutine Unlock and remove lock from map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a hard time seeing this one.
Worst case I can see is this it might leave an extra/unused lock in the map, but I can't see where it would get this error in something we've locked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looking at this again, I don't see it :( maybe you can explain on IRC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you got Lock, now inc() is zero. You started Unlock, at same time another goroutine did Lock(), you removed lock from map, another goroutine will have error on Unlock. I think it's better to exclude error from interface maybe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Lock/Unlock are protected by a mutex.
If someone calls Unlock on something that it doesn't actually own the lock for, I think that's a valid error (and would generate a panic under a normal sync/mutex).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dec() is not protected. Lock() -> Unlock() + Lock{dec()} -> Unlock() with error for valid Lock().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd lock request wouldn't be at dec() because at this point the nameLock is still locked, so it's still hanging on L90 waiting to lock.
The count/delete is done while still holding nameLock, and if the nameLock isn't being held, then it really is a problem with how the caller is using it... but I suppose we can sitll protect against this by ditching the atomic counter and just using a 2nd mutex for synchronizing access here... wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's ok. Sorry to waste your time for explaining :)

@cpuguy83
Copy link
Member Author

cpuguy83 commented Nov 5, 2015

Also ping @stevvooe since I think you implemented something similar (though w/o the cleanup) in graph.

@LK4D4
Copy link
Contributor

LK4D4 commented Nov 6, 2015

LGTM

LK4D4 added a commit that referenced this pull request Nov 6, 2015
…store

Fix potential races in the volume store
@LK4D4 LK4D4 merged commit cc207aa into moby:master Nov 6, 2015
@cpuguy83 cpuguy83 deleted the use_finer_locking_for_volume_store branch November 6, 2015 16:54
@stevvooe
Copy link
Contributor

stevvooe commented Nov 7, 2015

@cpuguy83 @LK4D4 Indeed. This is very similar to https://github.com/docker/docker/blob/master/graph/mutex.go except it manages the reference counts. Might be good to convert graph to use this.

@thaJeztah
Copy link
Member

cool, merged! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/needs-attention Calls for a collective discussion during a review session status/2-code-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants