Skip to content

Double setting of CONNECTION_CHECK_MAX_COUNT preventing pod adoption #532

@bitsofdave

Description

@bitsofdave

Checks

Chart Version

8.5.3

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:38:26Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.15-eks-9c63c4", GitCommit:"9c63c4037a56f9cad887ee76d55142abd4155179", GitTreeState:"clean", BuildDate:"2021-10-20T00:21:03Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

Helm Version

version.BuildInfo{Version:"v3.5.0", GitCommit:"32c22239423b3b4ba6706d450bd044baffdcf9e6", GitTreeState:"dirty", GoVersion:"go1.15.6"}

Description

This is sort of related to the closed question #416.

On my environment (Airflow 2.1.0, Chart 8.5.3), this double setting of CONNECTION_CHECK_MAX_COUNT is what's causing this issue where the scheduler is unable to adopt pods after the scheduler is restarted: apache/airflow#20690.

I attached some logs with additional context, but the part that's important is here where k8s thinks the patch request is trying to change the value of an env variable from 0 to 20:

Name:      \"CONNECTION_CHECK_MAX_COUNT\",\n- \t\t\t\t\tValue:     \"0\",\n+ \t\t\t\t\tValue:     \"20\",\n  \t\t\t\t\tValueFrom: nil

As a test, after editing the airflow-pod-template ConfigMap and removing the second occurrence where CONNECTION_CHECK_MAX_COUNT is set to 0, I was able to restart the scheduler mid-dagrun and saw it was able to adopt and delete Completed pods as expected.

Relevant Logs

[2022-02-24 22:24:48,385] {kubernetes_executor.py:665} INFO - attempting to adopt pod podname.69b7727977014e948bc0b31cb2946803
[2022-02-24 22:24:48,399] {kubernetes_executor.py:683} INFO - Failed to adopt pod podname.69b7727977014e948bc0b31cb2946803. Reason: (422)
Reason: Unprocessable Entity
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'c8ab7838-fdac-4e5f-b1cb-c0cd3a750bda', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Thu, 24 Feb 2022 22:24:48 GMT', 'Transfer-Encoding': 'chunked'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Pod \"podname.69b7727977014e948bc0b31cb2946803\" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)\n  core.PodSpec{\n  \tVolumes:        []core.Volume{{Name: \"aws-iam-token\", VolumeSource: core.VolumeSource{Projected: \u0026core.ProjectedVolumeSource{Sources: []core.VolumeProjection{{ServiceAccountToken: \u0026core.ServiceAccountTokenProjection{Audience: \"sts.amazonaws.com\", ExpirationSeconds: 86400, Path: \"token\"}}}, DefaultMode: \u0026420}}}, 
{Name: \"airflow-local-settings\", VolumeSource: core.VolumeSource{ConfigMap: \u0026core.ConfigMapVolumeSource{LocalObjectReference: core.LocalObjectReference{Name: \"airflow-local-settings\"}, Items: []core.KeyToPath{{Key: \"airflow_local_settings.py\", Path: \"airflow_local_settings.py\"}}, DefaultMode: \u0026420}}},
{Name: \"airflow-token\", VolumeSource: core.VolumeSource{Secret: \u0026core.SecretVolumeSource{SecretName: \"airflow-token\", DefaultMode: \u0026420}}}},\n  \tInitContainers: nil,\n  \tContainers: []core.Container{\n  \t\t{\n  \t\t\t... // 5 identical fields\n  \t\t\tPorts:   nil,\n  \t\t\tEnvFrom: []core.EnvFromSource{{SecretRef: \u0026core.SecretEnvSource{LocalObjectReference: core.LocalObjectReference{Name: \"airflow-config-envs\"}}}},\n  \t\t\tEnv: []core.EnvVar{\n  \t\t\t\t{\n  \t\t\t\t\tName:      \"CONNECTION_CHECK_MAX_COUNT\",\n- \t\t\t\t\tValue:     \"0\",\n+ \t\t\t\t\tValue:     \"20\",\n  \t\t\t\t\tValueFrom: nil,\n  \t\t\t\t},\n- \t\t\t\t{Name: \"CONNECTION_CHECK_MAX_COUNT\", Value: \"0\"},\n  \t\t\t\t{Name: \"AIRFLOW__CORE__EXECUTOR\", Value: \"LocalExecutor\"},\n  \t\t\t\t{Name: \"DATABASE_PASSWORD\", ValueFrom: \u0026core.EnvVarSource{SecretKeyRef: \u0026core.SecretKeySelector{LocalObjectReference: core.LocalObjectReference{Name: \"airflow-postgres-password\"}, Key: \"postgres-password\"}}},\n  \t\t\t\t{Name: \"REDIS_PASSWORD\"},\n+ \t\t\t\t{Name: \"CONNECTION_CHECK_MAX_COUNT\", Value: \"0\"},\n  \t\t\t\t... // 8 identical elements\n  \t\t\t},\n  \t\t\tResources:    core.ResourceRequirements{Limits: core.ResourceList{s\"cpu\": {i: resource.int64Amount{value: 1100, scale: -3}, s: \"1100m\", Format: \"DecimalSI\"}, s\"memory\": {i: resource.int64Amount{value: 402653184}, Format: \"BinarySI\"}}, Requests: core.ResourceList{s\"cpu\": {i: resource.int64Amount{value: 800, scale: -3}, s: \"800m\", Format: \"DecimalSI\"}, s\"memory\": {i: resource.int64Amount{value: 268435456}, Format: \"BinarySI\"}}},\n  \t\t\tVolumeMounts: []core.VolumeMount{
{Name: \"airflow-local-settings\", ReadOnly: true, MountPath: \"/opt/airflow/config\"},
{Name: \"airflow-token\", ReadOnly: true, MountPath: \"/var/run/secrets/kubernetes.io/serviceaccount\"},
{Name: \"aws-iam-token\", ReadOnly: true, MountPath: \"/var/run/secrets/eks.amazonaws.com/serviceaccount\"}},\n  \t\t\t... // 12 identical fields\n  \t\t},\n  \t},\n  \tEphemeralContainers: nil,\n  \tRestartPolicy:       \"Never\",\n  \t... // 25 identical fields\n  }\n","reason":"Invalid","details":{"name":"podname.69b7727977014e948bc0b31cb2946803","kind":"Pod","causes":[{"reason":"FieldValueForbidden","message":"Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)\n  core.PodSpec{\n  \tVolumes:        []core.Volume{
{Name: \"aws-iam-token\", VolumeSource: core.VolumeSource{Projected: \u0026core.ProjectedVolumeSource{Sources: []core.VolumeProjection{{ServiceAccountToken: \u0026core.ServiceAccountTokenProjection{Audience: \"sts.amazonaws.com\", ExpirationSeconds: 86400, Path: \"token\"}}}, DefaultMode: \u0026420}}},
{Name: \"airflow-local-settings\", VolumeSource: core.VolumeSource{ConfigMap: \u0026core.ConfigMapVolumeSource{LocalObjectReference: core.LocalObjectReference{Name: \"airflow-local-settings\"}, Items: []core.KeyToPath{{Key: \"airflow_local_settings.py\", Path: \"airflow_local_settings.py\"}}, DefaultMode: \u0026420}}},
{Name: \"airflow-token\", VolumeSource: core.VolumeSource{Secret: \u0026core.SecretVolumeSource{SecretName: \"airflow-token\", DefaultMode: \u0026420}}}},\n  \tInitContainers: nil,\n  \tContainers: []core.Container{\n  \t\t{\n  \t\t\t... // 5 identical fields\n  \t\t\tPorts:   nil,\n  \t\t\tEnvFrom: []core.EnvFromSource{{SecretRef: \u0026core.SecretEnvSource{LocalObjectReference: core.LocalObjectReference{Name: \"airflow-config-envs\"}}}},\n  \t\t\tEnv: []core.EnvVar{\n  \t\t\t\t{\n  \t\t\t\t\tName:      \"CONNECTION_CHECK_MAX_COUNT\",\n- \t\t\t\t\tValue:     \"0\",\n+ \t\t\t\t\tValue:     \"20\",\n  \t\t\t\t\tValueFrom: nil,\n  \t\t\t\t},\n- \t\t\t\t{Name: \"CONNECTION_CHECK_MAX_COUNT\", Value: \"0\"},\n  \t\t\t\t{Name: \"AIRFLOW__CORE__EXECUTOR\", Value: \"LocalExecutor\"},\n  \t\t\t\t{Name: \"DATABASE_PASSWORD\", ValueFrom: \u0026core.EnvVarSource{SecretKeyRef: \u0026core.SecretKeySelector{LocalObjectReference: core.LocalObjectReference{Name: \"airflow-postgres-password\"}, Key: \"postgres-password\"}}},\n  \t\t\t\t{Name: \"REDIS_PASSWORD\"},\n+ \t\t\t\t{Name: \"CONNECTION_CHECK_MAX_COUNT\", Value: \"0\"},\n  \t\t\t\t... // 8 identical elements\n  \t\t\t},\n  \t\t\tResources:    core.ResourceRequirements{Limits: core.ResourceList{s\"cpu\": {i: resource.int64Amount{value: 1100, scale: -3}, s: \"1100m\", Format: \"DecimalSI\"}, s\"memory\": {i: resource.int64Amount{value: 402653184}, Format: \"BinarySI\"}}, Requests: core.ResourceList{s\"cpu\": {i: resource.int64Amount{value: 800, scale: -3}, s: \"800m\", Format: \"DecimalSI\"}, s\"memory\": {i: resource.int64Amount{value: 268435456}, Format: \"BinarySI\"}}},\n  \t\t\tVolumeMounts: []core.VolumeMount{
{Name: \"airflow-local-settings\", ReadOnly: true, MountPath: \"/opt/airflow/config\"},
{Name: \"airflow-token\", ReadOnly: true, MountPath: \"/var/run/secrets/kubernetes.io/serviceaccount\"},
{Name: \"aws-iam-token\", ReadOnly: true, MountPath: \"/var/run/secrets/eks.amazonaws.com/serviceaccount\"}},\n  \t\t\t... // 12 identical fields\n  \t\t},\n  \t},\n  \tEphemeralContainers: nil,\n  \tRestartPolicy:       \"Never\",\n  \t... // 25 identical fields\n  }\n","field":"spec"}]},"code":422}

Custom Helm Values

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugkind - things not working properly

    Type

    No type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions