Skip to content

DatabricksSubmitRunOperator Get failed for Multi Task Databricks Job Run #28812

@NishchayAgrawal

Description

@NishchayAgrawal

Apache Airflow Provider(s)

databricks

Versions of Apache Airflow Providers

As we are running DatabricksSubmitRunOperator to run multi task databricks job as I am using airflow providers with mostly all flavours of versions, but when the databricks job get failed, DatabricksSubmitRunOperator gives below error its because this operator running get-output API, hence taking job run id instead of taking task run id

Error

File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/airflow/providers/databricks/hooks/databricks_base.py", line 355, in _do_api_call
    for attempt in self._get_retry_object():
  File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/tenacity/__init__.py", line 382, in __iter__
    do = self.iter(retry_state=retry_state)
  File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/tenacity/__init__.py", line 349, in iter
    return fut.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/airflow/providers/databricks/hooks/databricks_base.py", line 365, in _do_api_call
    response.raise_for_status()
  File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/airflow/providers/databricks/operators/databricks.py", line 375, in execute
    _handle_databricks_operator_execution(self, hook, self.log, context)
  File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/airflow/providers/databricks/operators/databricks.py", line 90, in _handle_databricks_operator_execution
    run_output = hook.get_run_output(operator.run_id)
  File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/airflow/providers/databricks/hooks/databricks.py", line 280, in get_run_output
    run_output = self._do_api_call(OUTPUT_RUNS_JOB_ENDPOINT, json)
  File "/home/ubuntu/.venv/airflow/lib/python3.8/site-packages/airflow/providers/databricks/hooks/databricks_base.py", line 371, in _do_api_call
    raise AirflowException(
airflow.exceptions.AirflowException: Response: b'{"error_code":"INVALID_PARAMETER_VALUE","message":"Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}', Status Code: 400
[2023-01-10, 05:15:12 IST] {taskinstance.py} INFO - Marking task as FAILED. dag_id=experiment_metrics_store_experiment_4, task_id=, execution_date=20230109T180804, start_date=20230109T180810, end_date=20230109T181512
[2023-01-10, 05:15:13 IST] {warnings.py} WARNING - /home/ubuntu/.venv/airflow/lib/python3.8/site-packages/airflow/utils/email.py:119: PendingDeprecationWarning: Fetching SMTP credentials from configuration variables will be deprecated in a future release. Please set credentials using a connection instead.
  send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)

Apache Airflow version

2.3.2

Operating System

macos

Deployment

Other

Deployment details

No response

What happened

No response

What you think should happen instead

No response

How to reproduce

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions