Fix concurrency issue in dind#48850
Conversation
978045f to
51f4c9d
Compare
|
I've run into and tried to fix this exact same race several times, and it's complex because if this code fails, I think Docker-in-Docker will fail too, so it has to succeed and simply ignoring failure isn't correct. 😅 Every time I run into this, I fix it by checking whether |
|
Just to give you a bit of background I am running minikube with dind in a gitlab pipeline.
Since I started ignoring
I can do this, but I don't understand how this would fix my problem. As setup_cgroup_controllers(){
sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \
> /sys/fs/cgroup/cgroup.subtree_control \
&>/dev/null
}
# busy wait for setup_cgroup_controllers to complete
while ! setup_cgroup_controllers; do sleep 0; doneWhat do you think @tianon ? |
|
Hmm, that's not a bad idea, although it would need to include the I don't think we need a function, and something like this could probably work (although needs some testing): mkdir -p /sys/fs/cgroup/init
# this happens in a loop because things like "docker exec" on our dind
# container will create new processes, which creates a race between our
# moving everything to "init" and enabling subtree_control
while ! {
# move the processes from the root group to the /init group,
# otherwise writing subtree_control fails with EBUSY.
# An error during moving non-existent process (i.e., "cat") is ignored.
xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || :
# enable controllers
sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \
> /sys/fs/cgroup/cgroup.subtree_control
}; do true; done(I don't think we need to worry too much about the error messages here -- this basically just puts the existing code into a tight little loop, as you've proposed.) I forgot what I'd done previously to reproduce this reliably (I recall having a ~100% failure rate in the past), but with the following I managed to get something that does reproduce the race/failure some percentage of the time, and after applying the above change I was no longer able to reproduce: #!/usr/bin/env bash
set -Eeuo pipefail -x
docker run -di --rm --name dind --mount "type=bind,src=$PWD/hack/dind,dst=/dind,ro" --entrypoint /dind --privileged docker:dind dockerd --log-level error
trap 'docker rm -vf dind' EXIT
docker logs -f dind &
while state="$(docker container inspect --format '{{ .State.Status }}' dind)" && [ "$state" = 'running' ] && ! docker exec dind docker version; do true; done |
|
@tianon : Awesome ! A few questions:
#!/usr/bin/env bash
set -e
while ! {
false
true
}; do sleep 0; done
echo "Should never get here"
|
Yes, that's not a big deal in our case because the
Yes, in fact we need the |
|
ok, @tianon . Glad you considered all these things. 🚀 |
| xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || : | ||
| # enable controllers | ||
| sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \ | ||
| > /sys/fs/cgroup/cgroup.subtree_control | ||
| > /sys/fs/cgroup/cgroup.subtree_control || : |
There was a problem hiding this comment.
| xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || : | |
| # enable controllers | |
| sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \ | |
| > /sys/fs/cgroup/cgroup.subtree_control | |
| > /sys/fs/cgroup/cgroup.subtree_control || : | |
| # this happens in a loop because things like "docker exec" on our dind | |
| # container will create new processes, which creates a race between our | |
| # moving everything to "init" and enabling subtree_control | |
| while ! { | |
| # move the processes from the root group to the /init group, | |
| # otherwise writing subtree_control fails with EBUSY. | |
| # An error during moving non-existent process (i.e., "cat") is ignored. | |
| xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || : | |
| # enable controllers | |
| sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \ | |
| > /sys/fs/cgroup/cgroup.subtree_control | |
| }; do true; done |
(I'm happy to update this PR on your behalf if you'd prefer! ❤️)
There was a problem hiding this comment.
Awesome. Thank you! @JSchltggr can you add this change? Otherwise, I can also open a new merge request.
There was a problem hiding this comment.
Thanks a lot tianon!
Unfortunately it was not possible to commit the suggestion directly. I've added the changes to another commit.
There was a problem hiding this comment.
It's probably ok to squash the 2 commits if possible to keep a cleaner git history
There was a problem hiding this comment.
@thaJeztah can you not just squash merge?
There was a problem hiding this comment.
We disabled squash merge on this repository as it discards GPG signing of the PR author, and generally prefer the contributor to squash commits (or organise PRs into logical commits, which is part of the review process), so if @JSchltggr is able to do a squash, that'd be great, otherwise I can manually squash and push to the branch through my maintainer permssions
There was a problem hiding this comment.
two weeks without response from @JSchltggr , can you maybe go forward and squash merge this @thaJeztah ? Thank you lots 🙏
|
@tianon : changes were added by @JSchltggr 🚀 I guess this can be merged now? |
Signed-off-by: JSchltggr <[email protected]>
786945e to
c43aa0b
Compare
|
@thaJeztah we got the squash 🎉 . Thank you @JSchltggr . Can this be merged now? |
When dind is run alongside with other processes, it might happen that the first xargs call fails (ignored) but then also the call to sed fails. Leading to the following error messages and dind terminating (
set -e) :This fixes the behavior by ignoring non 0 exit codes from sed
- What I did
Start a dind container which spawns some short living processes. Sometimes the call to sed will fail, because the process is already killed.
- How I did it
Create a simple Dockerfile and entrypoint:
Dockerfile
entrypoint.sh
- How to verify it
Build the container
Run the container in a loop and wait for the error to occur
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)