Skip to content

PodDisruptionBudget "maxUnavailable" doesn't prevent from downtime #1706

@varnastadeus

Description

@varnastadeus

Description

Currently, PodDisruptionBudget allows configuring only maxUnavailable, which doesn't prevent downtime while evicting shard pods. This is reproducible with a simple example having a single shard and 2 replicas and default maxUnavailable: 1.

     layout:
        shards:
        - internalReplication: "true"
          replicas:
          - templates:
              podTemplate: clickhouse-in-zone-a
          - templates:
              podTemplate: clickhouse-in-zone-b

While evicting both pods at a time, there is a short gap when 2nd pod can be evicted while the first pod is gone.

The status of pdb before start of the eviction:

{
  "conditions": [
    {
      "lastTransitionTime": "2025-05-11T16:05:17Z",
      "message": "",
      "observedGeneration": 1,
      "reason": "SufficientPods",
      "status": "True",
      "type": "DisruptionAllowed"
    }
  ],
  "currentHealthy": 2, // <- main part
  "desiredHealthy": 1, // <- main part
  "disruptionsAllowed": 1, // <- main part
  "expectedPods": 2,
  "observedGeneration": 1
}

The status of pdb when 1st pod was deleted:

{
  "conditions": [
    {
      "lastTransitionTime": "2025-05-11T16:05:58Z",
      "message": "",
      "observedGeneration": 1,
      "reason": "SufficientPods",
      "status": "True",
      "type": "DisruptionAllowed"
    }
  ],
  "currentHealthy": 1, // <- main part
  "desiredHealthy": 0, // <- main part
  "disruptionsAllowed": 1, // <- main part
  "expectedPods": 1,
  "observedGeneration": 1
}

Notice that at some point, pdb observes"expectedPods": 1. While it happens, 2nd pod is allowed to be evicted as it doesn't violate the pod's disruption budget based on the current state.

Based on k8s docs, we can't be using maxUnavailable with the pod setup we have:

https://kubernetes.io/docs/tasks/run-application/configure-pdb/#arbitrary-controllers-and-selectors

You can use a PDB with pods controlled by another resource, by an "operator", or bare pods, but with these restrictions:

- only .spec.minAvailable can be used, not .spec.maxUnavailable.
- only an integer value can be used with .spec.minAvailable, not a percentage.

Additionally, The eviction API will disallow eviction of any pod covered by multiple PDBs, so most users will want to avoid overlapping selectors so we can't have multiple PDBs defined.

Problem

  • maxUnavailable cannot be used reliably while managing arbitrary pods
  • default PDB creation is not configurable, preventing from customization

Solution

  • allow to disable default PDB creation, enabling the user to define PDBs manually
    Or
  • create minAvailable PDB per shard instead

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions