Skip to content
This repository was archived by the owner on Mar 9, 2022. It is now read-only.
This repository was archived by the owner on Mar 9, 2022. It is now read-only.

Kubelet restart terminates graceful termination #1098

@lbernail

Description

@lbernail

If the kubelet is restarted while some containers are in graceful termination and waiting for the period to expire, waitContainerStop will return on context cancellation and the task will be immediately killed.

Steps to reproduce:

  • create a pod with a long terminationGracePeriod
  • delete the pod, it will enter the TERMINATING phase
  • restart (or stop) the kubelet => the associated container will immediately be killed and the pod deleted

Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sig
  labels:
    app: sig
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sig
  template:
    metadata:
      labels:
        app: sig
    spec:
      tolerations:
      - operator: Exists
      containers:
      - name: sig
        image: lbernail/sig:0.1
        imagePullPolicy: Always
      terminationGracePeriodSeconds: 3600

(lbernail/sig:0.1 simply ignores SIGTERM)

After deleting the pod, we get these logs:

containerd[951]: time="2019-03-25T12:05:25Z" level=info msg="StopContainer for "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4" with timeout 3600 (s)"
containerd[951]: time="2019-03-25T12:05:25Z" level=info msg="Stop container "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4" with signal terminated"

If we then restart/stop the kubelet, we immediately get:

containerd[951]: time="2019-03-25T12:05:51Z" level=error msg="Stop container "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4" timed out" error="wait container "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4" is cancelled"
containerd[951]: time="2019-03-25T12:05:51Z" level=info msg="Kill container "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4""
containerd[951]: time="2019-03-25T12:05:51Z" level=error msg="StopContainer for "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4" failed" error="an error occurs during waiting for container "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4" to stop: wait container "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4" is cancelled"
containerd[951]: time="2019-03-25T12:05:51Z" level=info msg="Finish piping stdout of container "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4""
containerd[951]: time="2019-03-25T12:05:51Z" level=info msg="Finish piping stderr of container "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4""
containerd[951]: time="2019-03-25T12:05:51Z" level=info msg="shim reaped" id=2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4
containerd[951]: time="2019-03-25T12:09:50Z" level=info msg="Container to stop "2d533fb60ec326c7b46c76c9b11c1c9e09fe694f59a3bc5baa8f46e114749ab4" is not running, current state "CONTAINER_EXITED""

cc @Random-Liu

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions