-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What happened:
In a large-scale cluster, multiple pods in the terminating state exist and exist for a long time. (Node faults, application faults, and untimely manual handling) The pod to be scheduled is in the pending state. Volcano enters the pipeline state and waits for the terminating pod to release resources. The pod cannot run for a long time.
After the autoscaler scale-out function is enabled and the capacity is expanded by cluster pending pod, the scale-out function cannot be triggered.
What you expected to happen:
During Volcano scheduling, if pending pods exist in the cluster, nodes can be automatically expanded to schedule pending pods after autoscaler is enabled, regardless of whether the cluster enters the pipeline state.
How to reproduce it (as minimally and precisely as possible):
- Install the autoscaler component and expand the capacity based on the pending pod.
- Manually set a terminating pod.
- Schedule a pod and apply for the same amount of resources as the terminating pod.
Anything else we need to know?:
When the autoscaler scales out a pending pod, the condition for watching the pod is that the pod is in the pending state and the reason is Unschedulable.
Environment:
- Volcano Version: master
- Kubernetes version (use
kubectl version): - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a): - Install tools:
- Others: