Skip to content

Commit 9c2122b

Browse files
committed
KEP-5007: Update docs for DRADeviceBindingConditions in v1.36
Signed-off-by: Tsubasa Watanabe <[email protected]>
1 parent db8de23 commit 9c2122b

2 files changed

Lines changed: 36 additions & 25 deletions

File tree

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 32 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -948,7 +948,7 @@ Resource pool status is an *alpha feature* and only enabled when the
948948
[`DRAResourcePoolStatus` feature gate](/docs/reference/command-line-tools-reference/feature-gates/#DRAResourcePoolStatus)
949949
is enabled in the kube-apiserver and kube-controller-manager.
950950

951-
### Device Binding Conditions {#device-binding-conditions}
951+
### Device binding conditions
952952

953953
{{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
954954

@@ -970,13 +970,16 @@ following fields in the `Device` section of a `ResourceSlice`. Cluster administr
970970
must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
971971
gates for the scheduler to honor these fields.
972972

973-
- `bindingConditions`: A list of condition types that must be set to True in the
974-
status.conditions field of the associated ResourceClaim before the Pod can be bound.
975-
These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
976-
- `bindingFailureConditions`: A list of condition types that, if set to True in
973+
`bindingConditions`
974+
: A list of _condition types_ that must be set to True (in the `.status.conditions` field of the associated ResourceClaim) before the Pod can be bound. These conditions typically represent readiness signals, such as DeviceAttached or DeviceInitialized.
975+
976+
`bindingFailureConditions`
977+
: A list of condition types that, if set to True in
977978
status.conditions field of the associated ResourceClaim, indicate a failure state.
978979
If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
979-
- `bindsToNode`: if set to `true`, the scheduler records the selected node name in the
980+
981+
`bindsToNode`
982+
: if set to `true`, the scheduler records the selected node name in the
980983
`status.allocation.nodeSelector` field of the ResourceClaim.
981984
This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
982985
inside the ResourceClaim, which external controllers can use to perform node-specific
@@ -990,13 +993,32 @@ condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`
990993
The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`.
991994
If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
992995
clears the allocation and reschedules the Pod.
993-
This timeout duration is configurable by the user through `KubeSchedulerConfiguration`.
996+
A cluster administration can configure this timeout duration by editing the kube-scheduler configuration file.
997+
998+
An example of configuring this timeout in `KubeSchedulerConfiguration` is given below:
999+
1000+
```yaml
1001+
apiVersion: kubescheduler.config.k8s.io/v1
1002+
kind: KubeSchedulerConfiguration
1003+
profiles:
1004+
- schedulerName: default-scheduler
1005+
pluginConfig:
1006+
- name: DynamicResources
1007+
args:
1008+
apiVersion: kubescheduler.config.k8s.io/v1
1009+
kind: DynamicResourcesArgs
1010+
bindingTimeout: 60s
1011+
```
1012+
1013+
#### Example {#device-binding-conditions-example}
1014+
1015+
Here is an example of a ResourceSlice that you might see in a cluster where there's a DRA driver in use, and that driver supports binding conditions:
9941016
9951017
```yaml
9961018
apiVersion: resource.k8s.io/v1
9971019
kind: ResourceSlice
9981020
metadata:
999-
name: gpu-slice
1021+
name: gpu-slice-1
10001022
spec:
10011023
driver: dra.example.com
10021024
nodeSelector:
@@ -1036,24 +1058,9 @@ must be prepared (the `is-prepared` condition has a status of `True`) before bin
10361058
- External controllers can use the node selector in the ResourceClaim to perform
10371059
node-specific setup on the selected node.
10381060

1039-
An example of configuring this timeout in `KubeSchedulerConfiguration` is given below:
1040-
1041-
```yaml
1042-
apiVersion: kubescheduler.config.k8s.io/v1
1043-
kind: KubeSchedulerConfiguration
1044-
profiles:
1045-
- schedulerName: default-scheduler
1046-
pluginConfig:
1047-
- name: DynamicResources
1048-
args:
1049-
apiVersion: kubescheduler.config.k8s.io/v1
1050-
kind: DynamicResourcesArgs
1051-
bindingTimeout: 60s
1052-
```
1053-
1054-
Device binding conditions is an *alpha feature* and only enabled when the
1061+
Device binding conditions is a *beta feature* and is enabled by default, controlled by the
10551062
[`DRADeviceBindingConditions` feature gate](/docs/reference/command-line-tools-reference/feature-gates/#DRADeviceBindingConditions)
1056-
is enabled in the kube-apiserver and kube-scheduler.
1063+
in the kube-apiserver and kube-scheduler.
10571064

10581065
### Node allocatable resources {#node-allocatable-resources}
10591066

content/en/docs/reference/command-line-tools-reference/feature-gates/DRADeviceBindingConditions.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ stages:
99
- stage: alpha
1010
defaultValue: false
1111
fromVersion: "1.34"
12+
toVersion: "1.35"
13+
- stage: beta
14+
defaultValue: true
15+
fromVersion: "1.36"
1216
---
1317
Enables support for DeviceBindingConditions in the DRA related fields.
1418
This allows for thorough device readiness checks and attachment processes before Bind phase.

0 commit comments

Comments
 (0)