-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After upgrading to 17.12.0-ce containers are reported as "unknown" on start #35891
Comments
Short update: Starting and stopping containers does not seem to be the only event that triggers the aforementioned "unknown container" error. A quick journal check on the machine showed that the error message is emitted every few seconds for an already running container. There's only one container that keeps emitting the error, though. |
Was that container running before you upgraded? |
The release notes advised to stop all containers before upgrade - so I did a Stopping the daemon does stop all containers, correct? Then the answer is "no". I did not recreate the container, though, as this is on the second machine I tested on (I only recreated all the containers on the first machine). So it started again after the upgrade when starting the daemon. |
hm, never mind, I see these messages on a fresh install as well; this is the output of
Looks related to this change; #35812, but wondering why those warnings are printed; seems as if some code is listening for plugin events, but gets triggered for container events as well. |
I recreated the httpd container - the "unknown container" warning keeps popping during container runtime, I'm not quite sure if my activity triggers it, as it occurs as well if I do nothing. But some activity does trigger the warning (so it's not just container start/stop) |
The warnings come from this part of the code; moby/libcontainerd/client_daemon.go Lines 695 to 826 in 52656da
I added some debugging lines to that code; diff --git a/libcontainerd/client_daemon.go b/libcontainerd/client_daemon.go
index a9f7c11dd..dd227648d 100644
--- a/libcontainerd/client_daemon.go
+++ b/libcontainerd/client_daemon.go
@@ -713,6 +713,7 @@ func (c *client) processEventStream(ctx context.Context) {
}
}()
+ c.logger.Infof("Subscribing to events with namespace: %q", c.namespace)
eventStream, err = c.remote.EventService().Subscribe(ctx, &eventsapi.SubscribeRequest{
Filters: []string{
"namespace==" + c.namespace,
@@ -747,6 +748,13 @@ func (c *client) processEventStream(ctx context.Context) {
c.logger.WithField("topic", ev.Topic).Debug("event")
+ switch t := v.(type) {
+ default:
+ c.logger.WithFields(logrus.Fields{
+ "topic": ev.Topic,
+ "type": reflect.TypeOf(t)},
+ ).Info("received event")
+ }
switch t := v.(type) {
case *events.TaskCreate:
et = EventCreate
@@ -814,6 +822,7 @@ func (c *client) processEventStream(ctx context.Context) {
c.logger.WithField("container", ei.ContainerID).Warn("unknown container")
continue
}
+ c.logger.WithField("container", ei.ContainerID).Warn("found container")
if oomKilled {
ctr.setOOMKilled(true) With those changes, running With debug enabled:
With debug disabled (for easier reading);
So, I get the impression events are received twice, once in each namespace, but (if I interpret correctly) they are only generated once (only in the
Looking at this part of the containerd code; https://github.com/containerd/containerd/blob/2edc4758189c3bec00649804e5bb3840e082754d/events/exchange/exchange.go#L108-L115
I interpret that as; when providing multiple filters, only one of those filters has to match, so does that mean that If I'm correct, that filter subscribes to;
@stevvooe @dmcgowan any ideas? (I could be completely on the wrong foot here 😅) |
Right, so I added another debug line; c.logger.Infof("Received event with namespace: %q, but we subscribed to: %q", ev.Namespace, c.namespace) diff --git a/libcontainerd/client_daemon.go b/libcontainerd/client_daemon.go
index a9f7c11dd..8a9885721 100644
--- a/libcontainerd/client_daemon.go
+++ b/libcontainerd/client_daemon.go
@@ -713,6 +713,7 @@ func (c *client) processEventStream(ctx context.Context) {
}
}()
+ c.logger.Infof("Subscribing to events with namespace: %q", c.namespace)
eventStream, err = c.remote.EventService().Subscribe(ctx, &eventsapi.SubscribeRequest{
Filters: []string{
"namespace==" + c.namespace,
@@ -745,8 +746,17 @@ func (c *client) processEventStream(ctx context.Context) {
continue
}
+ c.logger.Infof("Received event with namespace: %q, but we subscribed to: %q", ev.Namespace, c.namespace)
c.logger.WithField("topic", ev.Topic).Debug("event")
+ switch t := v.(type) {
+ default:
+ c.logger.WithFields(logrus.Fields{
+ "topic": ev.Topic,
+ "type": reflect.TypeOf(t)},
+ ).Info("received event")
+ }
+
switch t := v.(type) {
case *events.TaskCreate:
et = EventCreate
@@ -814,6 +824,7 @@ func (c *client) processEventStream(ctx context.Context) {
c.logger.WithField("container", ei.ContainerID).Warn("unknown container")
continue
}
+ c.logger.WithField("container", ei.ContainerID).Warn("found container")
if oomKilled {
ctr.setOOMKilled(true) And it looks indeed that the filtering may be the issue; the daemon starts two clients; one subscribes to events for the And with that, you can see that both event-listeners receive events for both namespaces (
|
Event-fiilters were originally added in ddae20c#diff-10e0e646b98a7523848294e49bce04ecR664, but a fix was applied in a27abc6 (#35707) |
Opened a PR with a fix; #35896 |
On Ubuntu 16.04 I did
into
to block version 17.12.0 because 17.09.1 is working for me and I am waiting for the new version 18. |
@bor8 the message that's printed here is not harmful, it's mainly a cosmetic issue (i.e. it logs the "unknown" containers, because it's looking for a plugin container) |
Well, then I had another problem. I couldn't connect to my containers via script and exec -t (nothing happened) and in the logs there was "unknown container" or something like that. I'm not sure there's any connection. |
FYI encountering the same problem. More importantly: Just like bor8 commented: I also encounter problems with completely unresponsive containers. Even 'docker inspect XYZ' hangs forever) - even though the application inside works fine. In my impression all this started with the 17-12.0 ce update. Interestingly the most errors in journalctl for docker also seem related to healthchecks:
|
@pieterclaeys health checks hanging sounds like a different issue; it could be related to an issue in Runc that started to appear after the Meltdown and Spectre patches began rolling out to distros, see #36097 |
Encountering a similar issue where a container with healthcheck hangs. Application works fine, 17.12.0-ce version. Journalctl logs that look related:
Edit: We have Meltdown and Spectre patches applied. |
Ok I'm not really familiar with debugging / diagnostics on hanging containers. What commands would be useful to get more information on such an 'unresponsive' containter (given that docker inspect, docker exec, docker stop, docker kill all have no effect)? |
Yes those "hangs" on healtcheck definitely sound like the issue that's being triggered by the meltdown patches. It's a bug in runc that has been in there forever, but never caused a problem until the meltdown/spectre patches were applied. If you have a system to test on (I'd not recommend making such changes on a production system), you could grab the
More information can be obtained by sending a
Doing so will print a stack dump in the docker daemon logs (I think for This may require the daemon to be put in debug mode first (but it's worth trying without, perhaps that requirement was changed) |
Allright, I left the container hanging. Maybe this deserves a new github issue, I'll let you be the judge of that.
After that, the 'unknown container' errors no longer occur from this container. This 5789bcff5b... container is our own sofware (running a Jboss Wildfly application server with a healthcheck). So I presume that the healthcheck failed at that time.
After that, there are 'unknown container' warnings only from cbe506ac5ab9c... (which is Gitlab CE) I executed the two killall commands -> see both attached [files] |
Description
Just upgraded to docker 17.12.0-ce on one of our CentOS VPS-boxes. Everything seems to run fine so far, but a
journalctl -u docker
shows errors about "unknown" containers, e.g.:Apparently it complains about every container that runs on the machine. I tried to recreate all the containers, but that does not change anything. After a short investigation I noticed, that this message is emitted on every container creation, at least in our setup.
Steps to reproduce the issue:
Describe the results you received:
On container start, I get:
On container stop:
Describe the results you expected:
There should be no warnings.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
The environment is provided by our hoster, as is the kernel. I've tested another machine using CentOS 7.4 as well, but with a elrepo kernel (4.13.8-1.el7.elrepo.x86_64), but IMHO this does not look like a kernel issue.
The daemon.json is quite simple, so no surprises here:
The text was updated successfully, but these errors were encountered: