Skip to content

panic / v1alpha2._RuntimeService / SIGSEGV code=0x1 addr=0x38 pc=0x55f266891275 #3312

@nikopen

Description

@nikopen

bug report

Description

on startup, after 'start streaming server' and a few operations, containerd crashloops almost exactly every minute at same seconds and prints the following stacktrace

May 29 12:19:01 minionpublic1 containerd[9691]: time="2019-05-29T12:19:01.096721955Z" level=warning msg="The image sha256:f0fad859c909baef1b038ef8d2f6e76fc252e25a3d9af37
May 29 12:19:01 minionpublic1 containerd[9691]: time="2019-05-29T12:19:01.119703566Z" level=info msg="Start event monitor"
May 29 12:19:01 minionpublic1 containerd[9691]: time="2019-05-29T12:19:01.119822706Z" level=info msg="Start snapshots syncer"
May 29 12:19:01 minionpublic1 containerd[9691]: time="2019-05-29T12:19:01.119834583Z" level=info msg="Start streaming server"
May 29 12:19:10 minionpublic1 containerd[9691]: time="2019-05-29T12:19:08.807256438Z" level=info msg="RemoveContainer for "1cde6af58b9e9ec2ea4481d7ece8314a701bb0a67ba649
May 29 12:19:10 minionpublic1 containerd[9691]: time="2019-05-29T12:19:08.807491426Z" level=info msg="RemoveContainer for "1cde6af58b9e9ec2ea4481d7ece8314a701bb0a67ba649
May 29 12:19:10 minionpublic1 containerd[9691]: panic: runtime error: invalid memory address or nil pointer dereference
May 29 12:19:10 minionpublic1 containerd[9691]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x55fe3e32f275]
May 29 12:19:10 minionpublic1 containerd[9691]: goroutine 85 [running]:
May 29 12:19:10 minionpublic1 containerd[9691]: github.com/containerd/containerd.WithSnapshotCleanup(0x55fe3f2ab0c0, 0xc00072c3c0, 0xc0003fc0b0, 0xc000208ac0, 0x40, 0xc0
May 29 12:19:10 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/container_opts.go:144 +0xd5
May 29 12:19:10 minionpublic1 containerd[9691]: github.com/containerd/containerd.(*container).Delete(0xc0006bcc80, 0x55fe3f2ab0c0, 0xc00072c3c0, 0xc00000e080, 0x1, 0x1,
May 29 12:19:10 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/container.go:152 +0x1a1
May 29 12:19:10 minionpublic1 containerd[9691]: github.com/containerd/containerd/vendor/github.com/containerd/cri/pkg/server.(*criService).RemoveContainer(0xc000462480,
May 29 12:19:10 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/cri/pkg/server/container_re
May 29 12:19:10 minionpublic1 containerd[9691]: github.com/containerd/containerd/vendor/github.com/containerd/cri/pkg/server.(*instrumentedService).RemoveContainer(0xc00
May 29 12:19:10 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/containerd/cri/pkg/server/instrumented
May 29 12:19:10 minionpublic1 containerd[9691]: github.com/containerd/containerd/vendor/k8s.io/kubernetes/pkg/kubelet/apis/cri/runtime/v1alpha2._RuntimeService_RemoveCon
May 29 12:19:10 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/k8s.io/kubernetes/pkg/kubelet/apis/cri/runtime/v1
May 29 12:19:08 minionpublic1 systemd[1]: containerd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
May 29 12:19:11 minionpublic1 containerd[9691]: github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor(0x55fe3f2ab0c
May 29 12:19:11 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/serv
May 29 12:19:11 minionpublic1 containerd[9691]: github.com/containerd/containerd/vendor/k8s.io/kubernetes/pkg/kubelet/apis/cri/runtime/v1alpha2._RuntimeService_RemoveCon
May 29 12:19:11 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/k8s.io/kubernetes/pkg/kubelet/apis/cri/runtime/v1
May 29 12:19:11 minionpublic1 containerd[9691]: github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc000134000, 0x55fe3f2b5440, 0x
May 29 12:19:11 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1011 +0x4cf
May 29 12:19:11 minionpublic1 containerd[9691]: github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream(0xc000134000, 0x55fe3f2b5440, 0xc00
May 29 12:19:11 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1249 +0x1313
May 29 12:19:11 minionpublic1 containerd[9691]: github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004d45a0, 0xc000134000,
May 29 12:19:11 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:680 +0xa1
May 29 12:19:11 minionpublic1 containerd[9691]: created by github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
May 29 12:19:11 minionpublic1 containerd[9691]:         /home/travis/gopath/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:678 +0xa3
May 29 12:19:08 minionpublic1 systemd[1]: containerd.service: Failed with result 'exit-code'.

Important to notice that when the k8s node comes up, it first starts with containerd 1.2.5 which is the one installed by Docker, lives for about 90 seconds without doing anything (as shown in logs, after successfully starting), then restarts at containerd 1.2.6 which is installed afterwards, then the panic-loop occurs.

After a while of crashlooping, an hour or so, it stops doing so.

Steps to reproduce the issue:

It only happens in one node out of many but the node changes, cannot pinpoint any configuration difference or specific node problem.

The most relevance is with v1alpha2._RuntimeService which I don't know what it exactly is. In slightly different configurations pods fail to start printing a similar error to v1alpha2 runtimeservice etc.

In the meantime between crashloops, it prints a lot of these messages:


May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 4427 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 4512 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 4560 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 4660 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 4810 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 4912 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 4954 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 5123 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 5246 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 5403 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 5531 (containerd-shim) in control group while starting unit. Ignoring.
May 29 12:44:40 minionpublic1 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 29 12:44:40 minionpublic1 systemd[1]: containerd.service: Found left-over process 5634 (containerd-shim) in control group while starting unit. Ignoring.

and for other processes such as pods or s6-supervisor and others.

versions

containerd github.com/containerd/containerd v1.2.6 894b81a4b802e4eb2a91d1ce216b8817763c29fb

runc v1.0.0-rc8

k8s 1.11.8

config

io.containerd.runc.v1 shim
overlayfs
CNI 0.7.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions