Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After upgrading to 17.12.0-ce containers are reported as "unknown" on start #35891

Closed
nscheer opened this issue Dec 28, 2017 · 18 comments · Fixed by #35896
Closed

After upgrading to 17.12.0-ce containers are reported as "unknown" on start #35891

nscheer opened this issue Dec 28, 2017 · 18 comments · Fixed by #35896
Labels
area/distribution area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/17.12

Comments

@nscheer
Copy link

nscheer commented Dec 28, 2017

Description

Just upgraded to docker 17.12.0-ce on one of our CentOS VPS-boxes. Everything seems to run fine so far, but a journalctl -u docker shows errors about "unknown" containers, e.g.:

Dec 28 21:11:37 ***** dockerd[20364]: time="2017-12-28T21:11:37.395119873+01:00" level=warning msg="unknown container" container=380a254343d466da3481d10a9eb00b5ceb0246d5af80ed458875b4cff0ce5272 module=libcontainerd namespace=plugins.moby

Apparently it complains about every container that runs on the machine. I tried to recreate all the containers, but that does not change anything. After a short investigation I noticed, that this message is emitted on every container creation, at least in our setup.

Steps to reproduce the issue:

  1. docker pull alpine:3.7
  2. docker run --rm -it alpine:3.7 sh
  3. [ctrl+d]

Describe the results you received:
On container start, I get:

Dec 28 21:37:16 ***** dockerd[20364]: time="2017-12-28T21:37:16.078480737+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
Dec 28 21:37:16 ***** dockerd[20364]: time="2017-12-28T21:37:16+01:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/3c63b33d715570dd874f244f3dc4dba4ac606f1420ce7e24d52401fbcdcb3402/shim.sock" debug=false module="containerd/tasks" pid=22712
Dec 28 21:37:16 ***** dockerd[20364]: time="2017-12-28T21:37:16.278804687+01:00" level=warning msg="unknown container" container=3c63b33d715570dd874f244f3dc4dba4ac606f1420ce7e24d52401fbcdcb3402 module=libcontainerd namespace=plugins.moby
Dec 28 21:37:16 ***** dockerd[20364]: time="2017-12-28T21:37:16.313901450+01:00" level=warning msg="unknown container" container=3c63b33d715570dd874f244f3dc4dba4ac606f1420ce7e24d52401fbcdcb3402 module=libcontainerd namespace=plugins.moby

On container stop:

Dec 28 21:37:28 ***** dockerd[20364]: time="2017-12-28T21:37:28.254949107+01:00" level=warning msg="unknown container" container=3c63b33d715570dd874f244f3dc4dba4ac606f1420ce7e24d52401fbcdcb3402 module=libcontainerd namespace=plugins.moby
Dec 28 21:37:28 ***** dockerd[20364]: time="2017-12-28T21:37:28+01:00" level=info msg="shim reaped" id=3c63b33d715570dd874f244f3dc4dba4ac606f1420ce7e24d52401fbcdcb3402 module="containerd/tasks"
Dec 28 21:37:28 ***** dockerd[20364]: time="2017-12-28T21:37:28.330940300+01:00" level=info msg="ignoring event" module=libcontainerd namespace=plugins.moby topic=/tasks/delete type="*events.TaskDelete"
Dec 28 21:37:28 ***** dockerd[20364]: time="2017-12-28T21:37:28.331907436+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Dec 28 21:37:28 ***** dockerd[20364]: time="2017-12-28T21:37:28.376950290+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/delete type="*events.ContainerDelete"

Describe the results you expected:

There should be no warnings.

Output of docker version:

Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    c97c6d6
 Built: Wed Dec 27 20:10:14 2017
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   c97c6d6
  Built:        Wed Dec 27 20:12:46 2017
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 6
 Running: 4
 Paused: 0
 Stopped: 2
Images: 48
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.4.27-x86_64-jb1
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.903GiB
Name: **************
ID: VBJF:EXSO:5X7A:CY5M:ITIS:6SNX:5JIK:HXKR:YHBY:QUNE:245Q:5LJG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: nscheer
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

The environment is provided by our hoster, as is the kernel. I've tested another machine using CentOS 7.4 as well, but with a elrepo kernel (4.13.8-1.el7.elrepo.x86_64), but IMHO this does not look like a kernel issue.

The daemon.json is quite simple, so no surprises here:

{
    "storage-driver": "overlay2",
    "log-driver": "json-file",
    "log-opts":
    {
        "max-size": "10m",
        "max-file": "3"
    }
}
@nscheer
Copy link
Author

nscheer commented Dec 28, 2017

Short update: Starting and stopping containers does not seem to be the only event that triggers the aforementioned "unknown container" error.

A quick journal check on the machine showed that the error message is emitted every few seconds for an already running container. There's only one container that keeps emitting the error, though.
It's a httpd-container and it seems as if activity on the container triggers the errors (HTTP GET-Requests in this case).

@thaJeztah
Copy link
Member

Was that container running before you upgraded?

@nscheer
Copy link
Author

nscheer commented Dec 28, 2017

The release notes advised to stop all containers before upgrade - so I did a systemctl stop docker before the upgrade (I then upgraded manually using the rpm, i.e. yum erase docker-ce followed by yum install docker-ce-17.12.0.ce-1.el7.centos.x86_64.rpm).

Stopping the daemon does stop all containers, correct? Then the answer is "no".

I did not recreate the container, though, as this is on the second machine I tested on (I only recreated all the containers on the first machine). So it started again after the upgrade when starting the daemon.

@thaJeztah
Copy link
Member

hm, never mind, I see these messages on a fresh install as well; this is the output of docker run -dit --rm alpine:3.7 sh;

Dec 28 23:47:13 ubuntu-2gb-ams3-01 dockerd[3567]: time="2017-12-28T23:47:13.575989205Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
Dec 28 23:47:13 ubuntu-2gb-ams3-01 dockerd[3567]: time="2017-12-28T23:47:13Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/904bcfbafa59cdf9d54a1953c1d64c810af97e2c9be43e431fbd6608d0828498/shim.sock" debug=false module="containerd/tasks" pid=4008
Dec 28 23:47:13 ubuntu-2gb-ams3-01 dockerd[3567]: time="2017-12-28T23:47:13.773612254Z" level=warning msg="unknown container" container=904bcfbafa59cdf9d54a1953c1d64c810af97e2c9be43e431fbd6608d0828498 module=libcontainerd namespace=plugins.moby
Dec 28 23:47:13 ubuntu-2gb-ams3-01 dockerd[3567]: time="2017-12-28T23:47:13.793758746Z" level=warning msg="unknown container" container=904bcfbafa59cdf9d54a1953c1d64c810af97e2c9be43e431fbd6608d0828498 module=libcontainerd namespace=plugins.moby

Looks related to this change; #35812, but wondering why those warnings are printed; seems as if some code is listening for plugin events, but gets triggered for container events as well.

ping @stevvooe @cpuguy83 PTAL

@nscheer
Copy link
Author

nscheer commented Dec 29, 2017

I recreated the httpd container - the "unknown container" warning keeps popping during container runtime, I'm not quite sure if my activity triggers it, as it occurs as well if I do nothing. But some activity does trigger the warning (so it's not just container start/stop)

@thaJeztah
Copy link
Member

The warnings come from this part of the code;

func (c *client) processEventStream(ctx context.Context) {
var (
err error
eventStream eventsapi.Events_SubscribeClient
ev *eventsapi.Envelope
et EventType
ei EventInfo
ctr *container
)
defer func() {
if err != nil {
select {
case <-ctx.Done():
c.logger.WithError(ctx.Err()).
Info("stopping event stream following graceful shutdown")
default:
go c.processEventStream(ctx)
}
}
}()
eventStream, err = c.remote.EventService().Subscribe(ctx, &eventsapi.SubscribeRequest{
Filters: []string{
"namespace==" + c.namespace,
"topic~=/tasks/",
},
}, grpc.FailFast(false))
if err != nil {
return
}
var oomKilled bool
for {
ev, err = eventStream.Recv()
if err != nil {
errStatus, ok := status.FromError(err)
if !ok || errStatus.Code() != codes.Canceled {
c.logger.WithError(err).Error("failed to get event")
}
return
}
if ev.Event == nil {
c.logger.WithField("event", ev).Warn("invalid event")
continue
}
v, err := typeurl.UnmarshalAny(ev.Event)
if err != nil {
c.logger.WithError(err).WithField("event", ev).Warn("failed to unmarshal event")
continue
}
c.logger.WithField("topic", ev.Topic).Debug("event")
switch t := v.(type) {
case *events.TaskCreate:
et = EventCreate
ei = EventInfo{
ContainerID: t.ContainerID,
ProcessID: t.ContainerID,
Pid: t.Pid,
}
case *events.TaskStart:
et = EventStart
ei = EventInfo{
ContainerID: t.ContainerID,
ProcessID: t.ContainerID,
Pid: t.Pid,
}
case *events.TaskExit:
et = EventExit
ei = EventInfo{
ContainerID: t.ContainerID,
ProcessID: t.ID,
Pid: t.Pid,
ExitCode: t.ExitStatus,
ExitedAt: t.ExitedAt,
}
case *events.TaskOOM:
et = EventOOM
ei = EventInfo{
ContainerID: t.ContainerID,
OOMKilled: true,
}
oomKilled = true
case *events.TaskExecAdded:
et = EventExecAdded
ei = EventInfo{
ContainerID: t.ContainerID,
ProcessID: t.ExecID,
}
case *events.TaskExecStarted:
et = EventExecStarted
ei = EventInfo{
ContainerID: t.ContainerID,
ProcessID: t.ExecID,
Pid: t.Pid,
}
case *events.TaskPaused:
et = EventPaused
ei = EventInfo{
ContainerID: t.ContainerID,
}
case *events.TaskResumed:
et = EventResumed
ei = EventInfo{
ContainerID: t.ContainerID,
}
default:
c.logger.WithFields(logrus.Fields{
"topic": ev.Topic,
"type": reflect.TypeOf(t)},
).Info("ignoring event")
continue
}
ctr = c.getContainer(ei.ContainerID)
if ctr == nil {
c.logger.WithField("container", ei.ContainerID).Warn("unknown container")
continue
}
if oomKilled {
ctr.setOOMKilled(true)
oomKilled = false
}
ei.OOMKilled = ctr.getOOMKilled()
c.processEvent(ctr, et, ei)
}
}

I added some debugging lines to that code;

diff --git a/libcontainerd/client_daemon.go b/libcontainerd/client_daemon.go
index a9f7c11dd..dd227648d 100644
--- a/libcontainerd/client_daemon.go
+++ b/libcontainerd/client_daemon.go
@@ -713,6 +713,7 @@ func (c *client) processEventStream(ctx context.Context) {
                }
        }()
 
+       c.logger.Infof("Subscribing to events with namespace: %q", c.namespace)
        eventStream, err = c.remote.EventService().Subscribe(ctx, &eventsapi.SubscribeRequest{
                Filters: []string{
                        "namespace==" + c.namespace,
@@ -747,6 +748,13 @@ func (c *client) processEventStream(ctx context.Context) {
 
                c.logger.WithField("topic", ev.Topic).Debug("event")
 
+               switch t := v.(type) {
+               default:
+                       c.logger.WithFields(logrus.Fields{
+                               "topic": ev.Topic,
+                               "type":  reflect.TypeOf(t)},
+                       ).Info("received event")
+               }
                switch t := v.(type) {
                case *events.TaskCreate:
                        et = EventCreate
@@ -814,6 +822,7 @@ func (c *client) processEventStream(ctx context.Context) {
                        c.logger.WithField("container", ei.ContainerID).Warn("unknown container")
                        continue
                }
+               c.logger.WithField("container", ei.ContainerID).Warn("found container")
 
                if oomKilled {
                        ctr.setOOMKilled(true)

With those changes, running docker run -dit alpine:3.7 sh produces the following logs:

With debug enabled:

INFO[2017-12-29T00:29:15.316845654Z] Subscribing to events with namespace: "plugins.moby"  module=libcontainerd namespace=plugins.moby
INFO[2017-12-29T00:29:15.345345470Z] Subscribing to events with namespace: "moby"  module=libcontainerd namespace=moby
...
DEBU[2017-12-29T00:54:06.823921091Z] Calling POST /v1.30/containers/create        
DEBU[2017-12-29T00:54:06.824238383Z] form data: {"AttachStderr":false,"AttachStdin":false,"AttachStdout":false,"Cmd":["sh"],"Domainname":"","Entrypoint":null,"Env":[],"HostConfig":{"AutoRemove":false,"Binds":null,"BlkioDeviceReadBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceWriteIOps":null,"BlkioWeight":0,"BlkioWeightDevice":null,"CapAdd":null,"CapDrop":null,"Cgroup":"","CgroupParent":"","ConsoleSize":[0,0],"ContainerIDFile":"","CpuCount":0,"CpuPercent":0,"CpuPeriod":0,"CpuQuota":0,"CpuRealtimePeriod":0,"CpuRealtimeRuntime":0,"CpuShares":0,"CpusetCpus":"","CpusetMems":"","DeviceCgroupRules":null,"Devices":[],"DiskQuota":0,"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IOMaximumBandwidth":0,"IOMaximumIOps":0,"IpcMode":"","Isolation":"","KernelMemory":0,"Links":null,"LogConfig":{"Config":{},"Type":""},"Memory":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":-1,"NanoCpus":0,"NetworkMode":"default","OomKillDisable":false,"OomScoreAdj":0,"PidMode":"","PidsLimit":0,"PortBindings":{},"Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"RestartPolicy":{"MaximumRetryCount":0,"Name":"no"},"SecurityOpt":null,"ShmSize":0,"UTSMode":"","Ulimits":null,"UsernsMode":"","VolumeDriver":"","VolumesFrom":null},"Hostname":"","Image":"alpine:3.7","Labels":{},"NetworkingConfig":{"EndpointsConfig":{}},"OnBuild":null,"OpenStdin":true,"StdinOnce":false,"Tty":true,"User":"","Volumes":{},"WorkingDir":""} 
DEBU[2017-12-29T00:54:06.885418050Z] container mounted via layerStore: &{/var/lib/docker/overlay2/66f5273f6d893462b7fd31e20d8ac6baf5155b0e190ec53a3f69dd34fddc0185/merged 0x2c41ee0 0x2c41ee0} 
DEBU[2017-12-29T00:54:06.958430905Z] Calling POST /v1.30/containers/a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f/wait?condition=next-exit 
DEBU[2017-12-29T00:54:06.959046909Z] Calling POST /v1.30/containers/a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f/start 
DEBU[2017-12-29T00:54:06.959820782Z] container mounted via layerStore: &{/var/lib/docker/overlay2/66f5273f6d893462b7fd31e20d8ac6baf5155b0e190ec53a3f69dd34fddc0185/merged 0x2c41ee0 0x2c41ee0} 
DEBU[2017-12-29T00:54:06.960165116Z] Assigning addresses for endpoint gallant_swanson's interface on network bridge 
DEBU[2017-12-29T00:54:06.960192387Z] RequestAddress(LocalDefault/172.18.0.0/16, <nil>, map[]) 
DEBU[2017-12-29T00:54:06.964008988Z] Assigning addresses for endpoint gallant_swanson's interface on network bridge 
DEBU[2017-12-29T00:54:06.967287013Z] Programming external connectivity on endpoint gallant_swanson (b712d1ba9bc47e6fd56397e619d0a3aec62b2fcd6022702dfdc5a21f6b4abb83) 
DEBU[2017-12-29T00:54:06.968543253Z] EnableService a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f START 
DEBU[2017-12-29T00:54:06.968569971Z] EnableService a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f DONE 
DEBU[2017-12-29T00:54:06.970466421Z] bundle dir created                            bundle=/var/run/docker/containerd/a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f module=libcontainerd namespace=moby root=/var/lib/docker/overlay2/66f5273f6d893462b7fd31e20d8ac6baf5155b0e190ec53a3f69dd34fddc0185/merged
DEBU[0004] event published                               module="containerd/containers" ns=moby topic="/containers/create" type=containerd.events.ContainerCreate
DEBU[2017-12-29T00:54:06.975310441Z] event                                         module=libcontainerd namespace=moby topic=/containers/create
INFO[2017-12-29T00:54:06.975339199Z] received event                                module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
INFO[2017-12-29T00:54:06.975365767Z] ignoring event                                module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
INFO[0004] shim docker-containerd-shim started           address="/containerd-shim/moby/a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f/shim.sock" debug=true module="containerd/tasks" pid=2185
DEBU[0000] registering ttrpc server                     
DEBU[0000] serving api on unix socket                    socket="[inherited from parent]"
DEBU[2017-12-29T00:54:07.150987785Z] sandbox set key processing took 126.453698ms for container a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f 
DEBU[0004] event published                               module="containerd/tasks" ns=moby topic="/tasks/create" type=containerd.events.TaskCreate
DEBU[2017-12-29T00:54:07.333920651Z] event                                         module=libcontainerd namespace=moby topic=/tasks/create
INFO[2017-12-29T00:54:07.334029339Z] received event                                module=libcontainerd namespace=moby topic=/tasks/create type="*events.TaskCreate"
WARN[2017-12-29T00:54:07.334115480Z] found container                               container=a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f module=libcontainerd namespace=moby
DEBU[2017-12-29T00:54:07.334114515Z] event                                         module=libcontainerd namespace=plugins.moby topic=/tasks/create
INFO[2017-12-29T00:54:07.334597035Z] received event                                module=libcontainerd namespace=plugins.moby topic=/tasks/create type="*events.TaskCreate"
WARN[2017-12-29T00:54:07.334626310Z] unknown container                             container=a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f module=libcontainerd namespace=plugins.moby
DEBU[0004] event published                               module="containerd/tasks" ns=moby topic="/tasks/start" type=containerd.events.TaskStart
DEBU[2017-12-29T00:54:07.346693853Z] event                                         module=libcontainerd namespace=moby topic=/tasks/start
INFO[2017-12-29T00:54:07.346736929Z] received event                                module=libcontainerd namespace=moby topic=/tasks/start type="*events.TaskStart"
WARN[2017-12-29T00:54:07.346785559Z] found container                               container=a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f module=libcontainerd namespace=moby
DEBU[2017-12-29T00:54:07.346729352Z] event                                         module=libcontainerd namespace=plugins.moby topic=/tasks/start
INFO[2017-12-29T00:54:07.346868321Z] received event                                module=libcontainerd namespace=plugins.moby topic=/tasks/start type="*events.TaskStart"
WARN[2017-12-29T00:54:07.346970045Z] unknown container                             container=a502c039f0dc0d5f61b20e3fff2eb1660e2594f8aaeeeb3f4517d9a1a6e7b37f module=libcontainerd namespace=plugins.moby

With debug disabled (for easier reading);

INFO[2017-12-29T00:51:51.365118077Z] Subscribing to events with namespace: "plugins.moby"  module=libcontainerd namespace=plugins.moby
...
INFO[2017-12-29T00:51:51.372195549Z] Subscribing to events with namespace: "moby"  module=libcontainerd namespace=moby
...
INFO[2017-12-29T00:51:56.225535080Z] received event                                module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
INFO[2017-12-29T00:51:56.225607461Z] ignoring event                                module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
INFO[0004] shim docker-containerd-shim started           address="/containerd-shim/moby/c52b5f361fa30913e313f74f94563e3e7047874c5808ee0e1a422d6b8a63cbda/shim.sock" debug=false module="containerd/tasks" pid=1982
INFO[2017-12-29T00:51:56.653343640Z] received event                                module=libcontainerd namespace=moby topic=/tasks/create type="*events.TaskCreate"
WARN[2017-12-29T00:51:56.653386119Z] found container                               container=c52b5f361fa30913e313f74f94563e3e7047874c5808ee0e1a422d6b8a63cbda module=libcontainerd namespace=moby
INFO[2017-12-29T00:51:56.653626735Z] received event                                module=libcontainerd namespace=plugins.moby topic=/tasks/create type="*events.TaskCreate"
WARN[2017-12-29T00:51:56.653671783Z] unknown container                             container=c52b5f361fa30913e313f74f94563e3e7047874c5808ee0e1a422d6b8a63cbda module=libcontainerd namespace=plugins.moby
INFO[2017-12-29T00:51:56.666526163Z] received event                                module=libcontainerd namespace=plugins.moby topic=/tasks/start type="*events.TaskStart"
WARN[2017-12-29T00:51:56.666582098Z] unknown container                             container=c52b5f361fa30913e313f74f94563e3e7047874c5808ee0e1a422d6b8a63cbda module=libcontainerd namespace=plugins.moby
INFO[2017-12-29T00:51:56.666532696Z] received event                                module=libcontainerd namespace=moby topic=/tasks/start type="*events.TaskStart"
WARN[2017-12-29T00:51:56.666635935Z] found container                               container=c52b5f361fa30913e313f74f94563e3e7047874c5808ee0e1a422d6b8a63cbda module=libcontainerd namespace=moby

So, I get the impression events are received twice, once in each namespace, but (if I interpret correctly) they are only generated once (only in the moby namespace; as expected, because that's the namespace for containers);

DEBU[0004] event published                               module="containerd/containers" ns=moby topic="/containers/create" type=containerd.events.ContainerCreate
DEBU[0004] event published                               module="containerd/tasks" ns=moby topic="/tasks/create" type=containerd.events.TaskCreate
DEBU[0004] event published                               module="containerd/tasks" ns=moby topic="/tasks/start" type=containerd.events.TaskStart

Looking at this part of the containerd code; https://github.com/containerd/containerd/blob/2edc4758189c3bec00649804e5bb3840e082754d/events/exchange/exchange.go#L108-L115

// Zero or more filters may be provided as strings. Only events that match
// any of the provided filters will be sent on the channel.

I interpret that as; when providing multiple filters, only one of those filters has to match, so does that mean that Filters: []string{"namespace==" + c.namespace, "topic~=/tasks/"} matches all events that either have namespace==<provided-namespace> or have topic~=/tasks/?

If I'm correct, that filter subscribes to;

  • events with topic~=/tasks/ (for any namespace)
  • events with any other topic, but with a matching namespace

@stevvooe @dmcgowan any ideas? (I could be completely on the wrong foot here 😅)

@thaJeztah
Copy link
Member

Right, so I added another debug line;

c.logger.Infof("Received event with namespace: %q, but we subscribed to: %q", ev.Namespace, c.namespace)
diff --git a/libcontainerd/client_daemon.go b/libcontainerd/client_daemon.go
index a9f7c11dd..8a9885721 100644
--- a/libcontainerd/client_daemon.go
+++ b/libcontainerd/client_daemon.go
@@ -713,6 +713,7 @@ func (c *client) processEventStream(ctx context.Context) {
                }
        }()
 
+       c.logger.Infof("Subscribing to events with namespace: %q", c.namespace)
        eventStream, err = c.remote.EventService().Subscribe(ctx, &eventsapi.SubscribeRequest{
                Filters: []string{
                        "namespace==" + c.namespace,
@@ -745,8 +746,17 @@ func (c *client) processEventStream(ctx context.Context) {
                        continue
                }
 
+               c.logger.Infof("Received event with namespace: %q, but we subscribed to: %q", ev.Namespace, c.namespace)
                c.logger.WithField("topic", ev.Topic).Debug("event")
 
+               switch t := v.(type) {
+               default:
+                       c.logger.WithFields(logrus.Fields{
+                               "topic": ev.Topic,
+                               "type":  reflect.TypeOf(t)},
+                       ).Info("received event")
+               }
+
                switch t := v.(type) {
                case *events.TaskCreate:
                        et = EventCreate
@@ -814,6 +824,7 @@ func (c *client) processEventStream(ctx context.Context) {
                        c.logger.WithField("container", ei.ContainerID).Warn("unknown container")
                        continue
                }
+               c.logger.WithField("container", ei.ContainerID).Warn("found container")
 
                if oomKilled {
                        ctr.setOOMKilled(true)

And it looks indeed that the filtering may be the issue; the daemon starts two clients; one subscribes to events for the moby namespace and one for the plugins.moby namespace, but it looks like both receive events for both namespaces (see my previous comment regarding "multiple filters");

And with that, you can see that both event-listeners receive events for both namespaces (Received event with namespace: "moby", but we subscribed to: "plugins.moby");

INFO[2017-12-29T01:21:45.851196900Z] Received event with namespace: "moby", but we subscribed to: "moby"  module=libcontainerd namespace=moby
INFO[2017-12-29T01:21:45.851239471Z] received event                                module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
INFO[2017-12-29T01:21:45.851470362Z] ignoring event                                module=libcontainerd namespace=moby topic=/containers/create type="*events.ContainerCreate"
INFO[0012] shim docker-containerd-shim started           address="/containerd-shim/moby/061412af96e01451f2c94c420b5b086b71c9d5d9a5a7ff1ff7b0f4bc933577b7/shim.sock" debug=false module="containerd/tasks" pid=428
INFO[2017-12-29T01:21:46.234262659Z] Received event with namespace: "moby", but we subscribed to: "moby"  module=libcontainerd namespace=moby
INFO[2017-12-29T01:21:46.234317781Z] received event                                module=libcontainerd namespace=moby topic=/tasks/create type="*events.TaskCreate"
WARN[2017-12-29T01:21:46.234335567Z] found container                               container=061412af96e01451f2c94c420b5b086b71c9d5d9a5a7ff1ff7b0f4bc933577b7 module=libcontainerd namespace=moby
INFO[2017-12-29T01:21:46.234275978Z] Received event with namespace: "moby", but we subscribed to: "plugins.moby"  module=libcontainerd namespace=plugins.moby
INFO[2017-12-29T01:21:46.234435703Z] received event                                module=libcontainerd namespace=plugins.moby topic=/tasks/create type="*events.TaskCreate"
WARN[2017-12-29T01:21:46.234459406Z] unknown container                             container=061412af96e01451f2c94c420b5b086b71c9d5d9a5a7ff1ff7b0f4bc933577b7 module=libcontainerd namespace=plugins.moby
INFO[2017-12-29T01:21:46.246755507Z] Received event with namespace: "moby", but we subscribed to: "moby"  module=libcontainerd namespace=moby
INFO[2017-12-29T01:21:46.246810059Z] received event                                module=libcontainerd namespace=moby topic=/tasks/start type="*events.TaskStart"
WARN[2017-12-29T01:21:46.246826062Z] found container                               container=061412af96e01451f2c94c420b5b086b71c9d5d9a5a7ff1ff7b0f4bc933577b7 module=libcontainerd namespace=moby
INFO[2017-12-29T01:21:46.246781088Z] Received event with namespace: "moby", but we subscribed to: "plugins.moby"  module=libcontainerd namespace=plugins.moby
INFO[2017-12-29T01:21:46.246880678Z] received event                                module=libcontainerd namespace=plugins.moby topic=/tasks/start type="*events.TaskStart"
WARN[2017-12-29T01:21:46.246903313Z] unknown container                             container=061412af96e01451f2c94c420b5b086b71c9d5d9a5a7ff1ff7b0f4bc933577b7 module=libcontainerd namespace=plugins.moby

@thaJeztah thaJeztah added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Dec 29, 2017
@thaJeztah
Copy link
Member

thaJeztah commented Dec 29, 2017

Event-fiilters were originally added in ddae20c#diff-10e0e646b98a7523848294e49bce04ecR664, but a fix was applied in a27abc6 (#35707)

/cc @mlaventure @crosbymichael

@thaJeztah
Copy link
Member

Opened a PR with a fix; #35896

@bor8
Copy link

bor8 commented Jan 16, 2018

On Ubuntu 16.04 I did

Package: docker-ce
Pin: version 17.12.0~ce-0~ubuntu
Pin-Priority: -1

into

/etc/apt/preferences.d/docker-ce

to block version 17.12.0 because 17.09.1 is working for me and I am waiting for the new version 18.

@thaJeztah
Copy link
Member

@bor8 the message that's printed here is not harmful, it's mainly a cosmetic issue (i.e. it logs the "unknown" containers, because it's looking for a plugin container)

@bor8
Copy link

bor8 commented Jan 16, 2018

Well, then I had another problem. I couldn't connect to my containers via script and exec -t (nothing happened) and in the logs there was "unknown container" or something like that. I'm not sure there's any connection.

@ghost
Copy link

ghost commented Jan 29, 2018

FYI encountering the same problem.
I have servers where the docker journalctl logs are full of 'unkown container' logs. Appearantly if a container does a healthcheck every minute, then there will be an 'unknown container' log every minute.

More importantly: Just like bor8 commented: I also encounter problems with completely unresponsive containers. Even 'docker inspect XYZ' hangs forever) - even though the application inside works fine.
Other containers might respond fine.
But then, I need to completely reboot the server to get docker ok again.

In my impression all this started with the 17-12.0 ce update.
So I'm curious if this could be caused by this problem (... then I just wait for the patch to arrive) or still is some other issue.

Interestingly the most errors in journalctl for docker also seem related to healthchecks:

Jan 24 11:57:44 dockersrv01 dockerd[1142]: time="2018-01-24T11:57:44.905869208+01:00" level=warning msg="Health check for container 5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 error: context cancelled"
Jan 24 11:58:19 dockersrv01 dockerd[1142]: time="2018-01-24T11:58:19.981091888+01:00" level=warning msg="Health check for container 5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 error: context cancelled"
Jan 28 01:16:31 dockersrv01 dockerd[1142]: time="2018-01-28T01:16:31.733072314+01:00" level=warning msg="Health check for container 5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 error: context cancelled"
Jan 28 01:21:37 dockersrv01 dockerd[1142]: time="2018-01-28T01:21:37.469955160+01:00" level=warning msg="Health check for container 5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 error: context cancelled"
Jan 28 01:23:18 dockersrv01 dockerd[1142]: time="2018-01-28T01:23:18.682106518+01:00" level=warning msg="Health check for container 5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 error: context cancelled"
Jan 28 01:31:20 dockersrv01 dockerd[1142]: time="2018-01-28T01:31:20.508894226+01:00" level=warning msg="Health check for container 5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 error: context cancelled"

@thaJeztah
Copy link
Member

@pieterclaeys health checks hanging sounds like a different issue; it could be related to an issue in Runc that started to appear after the Meltdown and Spectre patches began rolling out to distros, see #36097

@emcfarlane
Copy link

emcfarlane commented Jan 29, 2018

Encountering a similar issue where a container with healthcheck hangs. Application works fine, 17.12.0-ce version. Journalctl logs that look related:

Jan 29 16:05:32 ci-runner-4 dockerd[791]: time="2018-01-29T16:05:32.843402624Z" level=warning msg="unknown container" container=056edc01afc254af1a2b0f0aef2c74c207c294a00152b6aefdf3fba5aecb576a module=libcontainerd namespace=plugins.moby
Jan 29 16:05:32 ci-runner-4 dockerd[791]: time="2018-01-29T16:05:32Z" level=info msg="shim reaped" id=056edc01afc254af1a2b0f0aef2c74c207c294a00152b6aefdf3fba5aecb576a module="containerd/tasks"
Jan 29 16:05:32 ci-runner-4 dockerd[791]: time="2018-01-29T16:05:32.888768481Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 29 16:05:32 ci-runner-4 dockerd[791]: time="2018-01-29T16:05:32.889006188Z" level=info msg="ignoring event" module=libcontainerd namespace=plugins.moby topic=/tasks/delete type="*events.TaskDelete"
Jan 29 16:05:32 ci-runner-4 dockerd[791]: time="2018-01-29T16:05:32.916853066Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/containers/delete type="*events.ContainerDelete"

Edit: We have Meltdown and Spectre patches applied.

@ghost
Copy link

ghost commented Jan 29, 2018

Ok I'm not really familiar with debugging / diagnostics on hanging containers. What commands would be useful to get more information on such an 'unresponsive' containter (given that docker inspect, docker exec, docker stop, docker kill all have no effect)?
(using 17.12.0-ce on Centos7)

@thaJeztah
Copy link
Member

Yes those "hangs" on healtcheck definitely sound like the issue that's being triggered by the meltdown patches. It's a bug in runc that has been in there forever, but never caused a problem until the meltdown/spectre patches were applied.

If you have a system to test on (I'd not recommend making such changes on a production system), you could grab the docker-runc binary from the Docker CE 18.02.0-rc1 packages (which are currently available in the "test" channel, e.g., for Ubuntu Xenial; https://download.docker.com/linux/ubuntu/dists/xenial/pool/test/amd64/), and replace the docker-runc binary of 17.12 for that one.

Ok I'm not really familiar with debugging / diagnostics on hanging containers.

More information can be obtained by sending a SIGUSR1 signal to the dockerd and docker-containerd processes;

killall -USR1 dockerd
killall -USR1 docker-containerd

Doing so will print a stack dump in the docker daemon logs (I think for dockerd it will print the location of a file where the stack dump is written to).

This may require the daemon to be put in debug mode first (but it's worth trying without, perhaps that requirement was changed)

@ghost
Copy link

ghost commented Jan 29, 2018

Allright, I left the container hanging. Maybe this deserves a new github issue, I'll let you be the judge of that.
I have an export of 'journalctl -u docker' that I have annotated. Interestingly, the problematic container suddenly starts reporting errors at jan 28 01:15 - 01:20 .

Jan 28 01:21:37 dockersrv01 dockerd[1142]: time="2018-01-28T01:21:37.469955160+01:00" level=warning msg="Health check for container 5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 error: context cancelled"
Jan 28 01:21:37 dockersrv01 dockerd[1142]: time="2018-01-28T01:21:37.986542995+01:00" level=warning msg="unknown container" container=5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 module=libcontainerd namespace=plugins.moby
Jan 28 01:21:37 dockersrv01 dockerd[1142]: time="2018-01-28T01:21:37.986568804+01:00" level=warning msg="Ignoring Exit Event, no such exec command found" container=5789bcff5bf2a93f5b72091664e24ae068c8adaa0e677fd9aa370bbf2182c533 exec-id=5d5091131152a7ef1c5f8a39a46ca45a2c42b9a51f98cd561a4f654e61476d48 exec-pid=29969

After that, the 'unknown container' errors no longer occur from this container. This 5789bcff5b... container is our own sofware (running a Jboss Wildfly application server with a healthcheck). So I presume that the healthcheck failed at that time.
It also prints

Jan 28 01:33:07 dockersrv01 dockerd[1142]: time="2018-01-28T01:33:07.344016369+01:00" level=error msg="stream copy error: reading from a closed fifo"
Jan 28 01:33:07 dockersrv01 dockerd[1142]: time="2018-01-28T01:33:07.344086676+01:00" level=error msg="stream copy error: reading from a closed fifo"

After that, there are 'unknown container' warnings only from cbe506ac5ab9c... (which is Gitlab CE)
So clearly the healthcheck for 5789bcff5b... was dead, at that point.

I executed the two killall commands -> see both attached [files]
journalctl_docker_output.txt
goroutine-stacks-2018-01-29T190415+0100.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distribution area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/17.12
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants