Add retry option in RedshiftDeleteClusterOperator to retry when an operation is running in the cluster #27820

vincbeck · 2022-11-21T15:54:35Z

Add retry option in RedshiftDeleteClusterOperator to retry when an operation is running in the cluster. When an operation is running in the cluster, the deletion fails and throw an exception like below. This option allows users to retry in such scenario.

INFO    [0m airflow.task:taskinstance.py:1278 Executing <Task(RedshiftDeleteClusterOperator): delete_cluster> on 2021-01-01 00:00:00+00:00
INFO    [0m airflow.task:taskinstance.py:1487 Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=example_redshift_to_s3
AIRFLOW_CTX_TASK_ID=delete_cluster
AIRFLOW_CTX_EXECUTION_DATE=2021-01-01T00:00:00+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=backfill__2021-01-01T00:00:00+00:00
INFO     airflow.hooks.base:base.py:71 Using connection ID 'aws_default' for task execution.
INFO    botocore.credentials:credentials.py:1180 Found credentials in environment variables.
ERROR   airflow.task:taskinstance.py:1746 Task failed with exception
Traceback (most recent call last):
  File "/opt/airflow/airflow/providers/amazon/aws/operators/redshift_cluster.py", line 492, in execute
    final_cluster_snapshot_identifier=self.final_cluster_snapshot_identifier,
  File "/opt/airflow/airflow/providers/amazon/aws/hooks/redshift_cluster.py", line 112, in delete_cluster
    FinalClusterSnapshotIdentifier=final_cluster_snapshot_identifier,
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 495, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 914, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidClusterStateFault: An error occurred (InvalidClusterState) when calling the DeleteCluster operation: There is an operation running on the Cluster. Please try to delete it at a later time.
INFO     airflow.task:taskinstance.py:1301 Marking task as FAILED. dag_id=example_redshift_to_s3, task_id=delete_cluster, execution_date=20210101T000000, start_date=20221113T120403, end_date=20221113T120403
ERROR   airflow.executors.debug_executor.DebugExecutor:debug_executor.py:85 Failed to execute task: An error occurred (InvalidClusterState) when calling the DeleteCluster operation: There is an operation running on the Cluster. Please try to delete it at a later time..

…eration is running in the cluster

uranusjr · 2022-11-22T02:36:58Z

airflow/providers/amazon/aws/operators/redshift_cluster.py

+        retry: bool = False,
+        retry_attempts: int = 10,


I wonder if we should just have one tries: int = 1; having two separate arguments for this feels like a potential tripwire.

That's a good idea. The only downside of it is you dont provide a default value in case you want to retry. If you want to retry, you have to come up with a number of retries which can be not very intuitive for users

I think we need some standardization (see #27276 (comment) )
I think it's better to decide on some standardization and then enforce it across. It's less productive when we discuss this per operator (and also may eventually have different names per operator)

I am not sure it is the same use case though. In Syed's PR, the retry logic handles a bug in boto3. Here it handles a valid use case: removing a Redshift cluster that has a running operation. That's why here, in my opinion, we need a flag to whether activate the retry. Some users might want the operator to fail in such scenario, some might want to retry. Regarding the naming, I am happy to rename the parameters to attempts and attempt_interval to make it more consistent across the operators

Unless I'm missing something the goal of these parameters from user perspective is to retry without counting this as Airflow retries so I do find this to be the same case (regardless of the reason that this functionality was needed to begin with)
WDYT?

You are correct but the way I see it is one is a bug fix (through a hack) and one is a feature

But maybe you are correct and at the end we should enforce the retry because most likely everyone wants to retry if the deletion fails because there is a running operation in the cluster

Should be good now @eladkal

Add retry option in RedshiftDeleteClusterOperator to retry when an op…

0566d21

…eration is running in the cluster

vincbeck requested a review from eladkal as a code owner November 21, 2022 15:54

boring-cyborg bot added area:providers area:system-tests provider:amazon AWS/Amazon - related issues labels Nov 21, 2022

uranusjr reviewed Nov 22, 2022

View reviewed changes

vincbeck and others added 4 commits November 22, 2022 10:46

Merge branch 'main' into vincbeck/redshift_delete

827f2cf

Fix tests. Rename retry_running to retry

00d287f

Retry by default, remove parameters

c2434ef

Remove retry from system tests

6dbb7d1

potiuk approved these changes Nov 26, 2022

View reviewed changes

potiuk merged commit 2ab5c1f into apache:main Nov 26, 2022

potiuk mentioned this pull request Nov 26, 2022

Status of testing Providers that were prepared on November 26, 2022 #27939

Closed

30 tasks

vincbeck deleted the vincbeck/redshift_delete branch December 6, 2022 16:36

syedahsn mentioned this pull request Mar 25, 2023

Add deferrable mode in Redshift delete cluster #30244

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add retry option in RedshiftDeleteClusterOperator to retry when an operation is running in the cluster #27820

Add retry option in RedshiftDeleteClusterOperator to retry when an operation is running in the cluster #27820

Uh oh!

vincbeck commented Nov 21, 2022

Uh oh!

uranusjr Nov 22, 2022

Uh oh!

vincbeck Nov 22, 2022

Uh oh!

eladkal Nov 22, 2022

Uh oh!

vincbeck Nov 22, 2022

Uh oh!

eladkal Nov 22, 2022

Uh oh!

vincbeck Nov 22, 2022

Uh oh!

vincbeck Nov 22, 2022 •

edited

Loading

Uh oh!

vincbeck Nov 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add retry option in RedshiftDeleteClusterOperator to retry when an operation is running in the cluster #27820

Add retry option in RedshiftDeleteClusterOperator to retry when an operation is running in the cluster #27820

Uh oh!

Conversation

vincbeck commented Nov 21, 2022

Uh oh!

uranusjr Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

vincbeck Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

eladkal Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

vincbeck Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

eladkal Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

vincbeck Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

vincbeck Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vincbeck Nov 24, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vincbeck Nov 22, 2022 •

edited

Loading