Zarf fails to init successfully if there's a pod that's currently stuck in terminating state and the node on that terminating pod doesn't exist anymore.
I think this chunk of code is related - it doesn't seem to check if a pod is in a bad state, i.e. terminating and possibly the node that it was supposed to live on is now gone 😬
https://github.com/defenseunicorns/zarf/blob/08c92e12a4a2b05d0ea5abe055c7c01ba9964051/src/pkg/cluster/injector.go#L453-L466
I know this is a very specific scenario, but it happened -
- We had an EKS cluster and were upgrading it from 1.27 -> 1.29
- Rolled all node and eks cluster versions to 1.29
- Pod stuck in terminating
- This terminating
pod.Spec.NodeName has a node on it that doesn't exist anymore because nodes were rolled to get new eks k8s version
- Zarf tries to run
GetNode on this node from k8s api, that errors out
- Zarf prints error:
Unable to generate a list of candidate images to perform the registry injection
Zarf fails to init successfully if there's a pod that's currently stuck in terminating state and the node on that terminating pod doesn't exist anymore.
I think this chunk of code is related - it doesn't seem to check if a pod is in a bad state, i.e. terminating and possibly the node that it was supposed to live on is now gone 😬
https://github.com/defenseunicorns/zarf/blob/08c92e12a4a2b05d0ea5abe055c7c01ba9964051/src/pkg/cluster/injector.go#L453-L466
I know this is a very specific scenario, but it happened -
pod.Spec.NodeNamehas a node on it that doesn't exist anymore because nodes were rolled to get new eks k8s versionGetNodeon this node from k8s api, that errors outUnable to generate a list of candidate images to perform the registry injection