Skip to content

Azure Spot Containers Stuck in Unhealthy-Repairing Cycle #47023

@oli2tup

Description

@oli2tup

Apache Airflow Provider(s)

microsoft-azure

Versions of Apache Airflow Providers

8.3.0

Apache Airflow version

2.7.3

Operating System

Ubuntu 20.04

Deployment

Other

Deployment details

Airflow running on a VM hosted in Azure

What happened

We are experiencing an issue with Azure Spot Containers where their status continuously cycles between Unhealthy → Repairing → Running, without actually executing any tasks.

  • When they return to the Running state, they remain idle and do not perform any actions.
  • Eventually, they go back to Unhealthy, repeating the cycle indefinitely.
  • Since they don’t stay in any state for long, they can bypass both container and Airflow timeouts.
  • Attempting to manually SSH into a container that reaches the Running state after being Unhealthy fails. In our experience, nothing can be done with the container other than terminating it.
  • It seems to occur about 10% of the time to SPOT containers in EU-West.

What you think should happen instead

Ideally, the container should be forcefully terminated when it enters the Unhealthy state to prevent this looping behaviour.

How to reproduce

Since this is a randomly occurring issue, there is no single snippet of code that can consistently reproduce it. However, this can increase the likelihood of encountering the problem:

  • Deploy multiple Azure Spot Containers running Airflow tasks.
  • Run tasks during peak hours (e.g., in the EU West region) to increase the chances
  • Monitor container lifecycle events to check if they enter an Unhealthy → Repairing → Running loop.
  • (Optional) Manually find a way to spoof the container's status as "Unhealthy."
  • Try SSH into a container that enters the "Running" state after being Unhealthy—it should fail.

It is difficult to force it to happen on demand.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions