Skip to content

c8d/list: Fix race condition when traversing containers#48367

Merged
vvoland merged 3 commits intomoby:masterfrom
vvoland:c8d-list-race-fix
Aug 26, 2024
Merged

c8d/list: Fix race condition when traversing containers#48367
vvoland merged 3 commits intomoby:masterfrom
vvoland:c8d-list-race-fix

Conversation

@vvoland
Copy link
Copy Markdown
Contributor

@vvoland vvoland commented Aug 23, 2024

- What I did
Fix a panic spotted in #47983 (comment).

Shared state is mutated inside a function which is called asynchronously:

// ApplyAll calls the reducer function with every container in the store.
// This operation is asynchronous in the memory store.
// NOTE: Modifications to the store MUST NOT be done by the StoreReducer.
func (c *memoryStore) ApplyAll(apply StoreReducer) {
wg := new(sync.WaitGroup)
for _, cont := range c.all() {
wg.Add(1)
go func(container *Container) {
apply(container)
wg.Done()
}(cont)
}
wg.Wait()
}

Unfortunately, it looks like the race detector doesn't pick this up for some reason.

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@vvoland vvoland added area/images Image Service kind/bugfix PR's that fix bugs containerd-integration Issues and PRs related to containerd integration process/cherry-pick/27.2 labels Aug 23, 2024
@vvoland vvoland added this to the 28.0.0 milestone Aug 23, 2024
@vvoland vvoland self-assigned this Aug 23, 2024
@vvoland vvoland requested review from laurazard and thaJeztah August 23, 2024 11:11
@thaJeztah
Copy link
Copy Markdown
Member

Hmm. So wondering; should synchronization for this happen in the memory store itself? I see it has locking, but (iiuc) for the store itself, and it looks like a RWMutex taking a read lock. However, the slice returned does not dereference the container; is the problem in that area? Should it either return a copy of the container, or should locking happen for each container? 😬

Comment thread daemon/containerd/image_list.go Outdated
@vvoland vvoland force-pushed the c8d-list-race-fix branch from 35c8422 to 50a0335 Compare August 23, 2024 12:07
@vvoland
Copy link
Copy Markdown
Contributor Author

vvoland commented Aug 23, 2024

@thaJeztah - There's no need to lock the container object itself, if the caller doesn't really mutate the container (like we do here), so it wouldn't make sense to lock inside ApplyAll.
We only read the ID and ImageManifest property of the image, which don't change after the container is created, so it's fine to read from the reference in this case.

@vvoland
Copy link
Copy Markdown
Contributor Author

vvoland commented Aug 23, 2024

The only reason why this sneaked past me is:

  • ApplyAll doesn't look like it would act asynchronously (and I didn't bother to check the implementation 🙈)
  • For some reason, it wasn't detected when I tested this with -race. Not sure why race detectors doesn't catch this, will need to investigate it more.

Copy link
Copy Markdown
Member

@laurazard laurazard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread daemon/containerd/image_list.go Outdated
@vvoland vvoland force-pushed the c8d-list-race-fix branch 3 times, most recently from ec83ce6 to 5ea8392 Compare August 23, 2024 17:31
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@thaJeztah
Copy link
Copy Markdown
Member

Ah, dang; linting failure somewhere

daemon/containerd/image_list_test.go:7: File is not `goimports`-ed (goimports)
	"github.com/docker/docker/container"

Use a regular for loop instead of ApplyAll which spawns a separate
goroutine for each separate container.

Signed-off-by: Paweł Gronowski <[email protected]>
Signed-off-by: Paweł Gronowski <[email protected]>
Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah
Copy link
Copy Markdown
Member

I fixed teh linting issue; problem was in the imports;

diff --git a/daemon/containerd/image_list_test.go b/daemon/containerd/image_list_test.go
index 50ba317e03..28b61db4b5 100644
--- a/daemon/containerd/image_list_test.go
+++ b/daemon/containerd/image_list_test.go
@@ -4,7 +4,6 @@ import (
        "context"
        "errors"
        "fmt"
-       "github.com/docker/docker/container"
        "math/rand"
        "os"
        "path/filepath"
@@ -23,6 +22,7 @@ import (
        "github.com/containerd/log/logtest"
        "github.com/containerd/platforms"
        imagetypes "github.com/docker/docker/api/types/image"
+       "github.com/docker/docker/container"
        daemonevents "github.com/docker/docker/daemon/events"
        "github.com/docker/docker/internal/testutils/specialimage"
        "github.com/opencontainers/go-digest"

Verify that the ImageData.Containers contains the ID of the container
using that image.

Signed-off-by: Paweł Gronowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/images Image Service containerd-integration Issues and PRs related to containerd integration kind/bugfix PR's that fix bugs process/cherry-picked

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants