Skip to content

Fetching the driver logs using SparkKubernetesSensor fails when the driver pod has a sidecar #18468

@cristian-fatu

Description

@cristian-fatu

Apache Airflow version

2.1.1

Operating System

Ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

I tried to run a simple Spark application using the SparkKubernetesOperator and SparkKubernetesSensor.
In the yaml file for the Spark Operator I added a sidecar container to the driver pod.
When the job runs in Airflow the SparkKubernetesSensor step will fail with the following error:

[2021-09-23 13:24:21,547] {spark_kubernetes.py:92} WARNING - Could not read logs for pod pyspark-pi-driver. It may have been disposed.
Make sure timeToLiveSeconds is set on your SparkApplication spec.
underlying exception: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Thu, 23 Sep 2021 13:24:21 GMT', 'Content-Length': '233'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"a container name must be specified for pod pyspark-pi-driver, choose one of: [spark-kubernetes-driver logging-sidecar]","reason":"BadRequest","code":400}\n'

In my yaml file I am not setting timeToLiveSeconds so the driver pod is still around at the end of the job execution, so there should be no issues fetching the logs.

I believe the error is due to the fact that in the call to get_pod_logs, from within SparkKubernetesSensor._log_driver, only the driver pod name is sent and not any container name. This syntax works fine if the driver container is alone in the pod, but it will throw an error if there are multiple containers inside the pod.

I'm attaching my DAG and yaml files.
spark-py-pi-dag-and-yaml.tar.gz

What you expected to happen

The SparkKubernetesSensor should be able to get the driver container logs even if there are sidecar containers running along side the driver.

How to reproduce

The attached YAML and DAG definition can be used to reproduce the issue.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions