-
Notifications
You must be signed in to change notification settings - Fork 0
Don't remove containerd's container after exit #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vvoland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering, since we already used containerd for managing the containers runtime - why does it work in the old case when Delete is being performed?
|
Will this work for stopped containers if the daemon and/or containerd and/or machine is restarted? |
The issue is actually that now, when we remove the containerd's container, the snapshot is also removed. I tried using leases so that the snapshot is not deleted but with no luck. |
We still lose all the containers when the daemon restarts. My guess is that it's another issue, maybe the containers aren't restored as they should, we should take a look at this in a followup |
Yeah ok so we will need to take a closer look at the restore logic I think, it reads a directory and loops over it to restore the containers. |
Ah, hm.. gotcha; yes I'd also be thinking of leases for that. Removing the snapshots definitely would be a problem for "pet containers". Wondering if this is what's nerdctl is using for that (I see it sets a label; could that label be to prevent it being garbage collected?); https://github.com/containerd/nerdctl/blob/31065c3e52f924d34bc9df370eef6b091a3e31ba/cmd/nerdctl/stop.go#L96-L99 |
|
oh, never mind; looks like that was added to store additional settings I think; containerd/nerdctl@97ece6a |
|
#49 I drafted a quick implementation of the leases. Tested with: works fine |
daemon/delete.go
Outdated
| if err := daemon.containerd.Delete(context.Background(), container.ID); err != nil { | ||
| logrus.Errorf("%s cleanup: failed to delete container from containerd: %v", container.ID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this return an error if the container was already gone? If so, can we detect a errdefs.IsNotExist() ?
If we can't exclude "NotExist" errors, probably should be logged as a Warnf (because we handle the situation); perhaps using WithField() and WithError()
logrus.WithError(err).WithField("container", container.ID).Error("cleanup: failed to delete container from containerd")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require a daemon.containerd.DeleteTask as well? (not sure; but noticed that's done in some of the other code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the same code we use in Cleanup https://github.com/moby/moby/blob/master/daemon/start.go#L264-L266
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added your suggestion for the log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I add a check for IsNotExist, what should we do in the two cases?
| // best effort to clean up old container object | ||
| daemon.containerd.DeleteTask(context.Background(), container.ID) | ||
| if err := daemon.containerd.Delete(context.Background(), container.ID); err != nil && !errdefs.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking if this is correct; previously (before we used snapshotters), deleting the container from containerd didn't have side-effects (we had control over the container's filesystem, so deleting the container was just deleting the instance/config of the container)
However, now that we do use the snapshotters, would Delete() here also remove the container's filesystem?
Perhaps in case of IsConflict() and we're using snapshotters we should consider it "done" (container already exists; nothing to be done)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using containerd snapshotters we won't try to create a container if it already exists, so we can't have any conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, unless someone tries to create a container with the same ID at the same time and we manage to somehow get in here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just return the conflict in that case? I realise it would be a rare condition (something else creating a container with the exact same ID). In general, getting a conflict in this situation would be unexpected, so not 100% sure we should "proceed as usual".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, added a check and a comment to explain why we do it, thanks
| err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions, newContainerOpts...) | ||
| if err != nil { | ||
| return translateContainerdStartErr(container.Path, container.SetExitCode, err) | ||
| if errdefs.IsConflict(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to double-check if we could use our own errdefs here, but looks like containerd.Create() does conversion for this specific error;
moby/libcontainerd/remote/client.go
Lines 142 to 144 in 44626ea
| if containerderrors.IsAlreadyExists(err) { | |
| return errors.WithStack(errdefs.Conflict(errors.New("id already in use"))) | |
| } |
We should indeed create a general utility for that though (making sure we convert all errors to our own?)
Keeping the containerd's container around while our container is active makes `docker start` possible. Signed-off-by: Djordje Lukic <[email protected]>
thaJeztah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Not upstreamed yet, but
Should be combined with: |
|
@rumpl this one still needed in upstream? Or superseded by the other PRs we did? (sorry, I'm lazy 😂) |
|
This one still needs to be (partially) upstreamed |
|
Okay I just checked and this one is good now |
Keeping the containerd's container around while our container is active makes
docker startpossible.This is not really ideal, the best way to go about this would be to create the containerd container on create but this would change too much of the logic. Once we switch completely to containerd we can come back to this and change the logic.