Skip to content

Bug: Zarf fails to init successfully if there's a pod that's currently stuck in terminating state and the node on that terminating pod doesn't exist anymore #2410

@zack-is-cool

Description

@zack-is-cool

Zarf fails to init successfully if there's a pod that's currently stuck in terminating state and the node on that terminating pod doesn't exist anymore.

I think this chunk of code is related - it doesn't seem to check if a pod is in a bad state, i.e. terminating and possibly the node that it was supposed to live on is now gone 😬

https://github.com/defenseunicorns/zarf/blob/08c92e12a4a2b05d0ea5abe055c7c01ba9964051/src/pkg/cluster/injector.go#L453-L466

I know this is a very specific scenario, but it happened -

  • We had an EKS cluster and were upgrading it from 1.27 -> 1.29
  • Rolled all node and eks cluster versions to 1.29
  • Pod stuck in terminating
  • This terminating pod.Spec.NodeName has a node on it that doesn't exist anymore because nodes were rolled to get new eks k8s version
  • Zarf tries to run GetNode on this node from k8s api, that errors out
  • Zarf prints error: Unable to generate a list of candidate images to perform the registry injection

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🐞Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions