-
Notifications
You must be signed in to change notification settings - Fork 229
Description
Is your feature request related to a problem? Please describe.
Recently, I had a k3s cluster get corrupted by the master of master nodes being deleted. Even the zarf-docker-registry was corrupt, and I lost it and the images it contained. I tried a zarf init, and the injector continued to clone pods that were in ImagePullBackoff of similar sates. The timeout before moving on to another pod to try to clone was pretty long, so it took an enormous amount of time to start up the injector pod correctly and some manual finagling with taints to get on a node that had the fewest pods in error states.
Describe the solution you'd like
- Given an existing cluster with many pods running, but also many that are in error states (like ImagePullBackoff)
- When running
zarf init - Then the injector filters out all pods but those that are healthy, in the "running" state to clone
Describe alternatives you've considered
An alternative might be to set the pod and/or node which to clone via an environment variable or --set.
Additional context
The timeout for an injector pod to get to the "running" state could also be lowered or made overrideable by an env var or --set