-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DynamicProvisioningScheduling support for GCE PD and RePD #67530
Conversation
/ok-to-test |
/test pull-kubernetes-e2e-gce-device-plugin-gpu |
pkg/volume/gce_pd/gce_util.go
Outdated
var activezones sets.String | ||
activezones, err = cloud.GetAllCurrentZones() | ||
if err != nil { | ||
glog.V(2).Infof("error getting zone information from GCE: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be logged since it will be returned as an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in the original code but unnecessary. Will remove.
}, | ||
}, | ||
Reject: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another negative test case could be not enough active zones
pkg/volume/util/util_test.go
Outdated
Reject: true, | ||
}, | ||
|
||
// Zones parameter specified do not match ReplicaCount [Fail] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's technically "not enough" replicaCount
pkg/volume/util/util_test.go
Outdated
}, | ||
|
||
// Select zones from node label and AllowedTopologies [Pass] | ||
// [1] nil Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is not correct
pkg/volume/util/util_test.go
Outdated
}, | ||
Reject: false, | ||
ExpectSpecificZones: true, | ||
ExpectedZones: "zoneX,zoneY", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpectedZone
pkg/volume/util/util_test.go
Outdated
}, | ||
|
||
// Select zones from node label and AllowedTopologies [Pass] | ||
// [1] nil Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not correct
pkg/volume/util/util_test.go
Outdated
ReplicaCount: 2, | ||
Reject: false, | ||
ExpectSpecificZones: true, | ||
ExpectedZones: "zoneW,zoneX", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know ExpectedZone is redundant when ExpectSpecificZones is true, but may be good to explicitly call out anyway.
pkg/volume/util/util_test.go
Outdated
}, | ||
Reject: false, | ||
ExpectSpecificZones: true, | ||
ExpectedZones: "zoneX,zoneY", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpectedZone
/assign @verult |
lgtm! will also let @verult review |
/test pull-kubernetes-kubemark-e2e-gce-big |
pkg/volume/gce_pd/gce_util.go
Outdated
configuredZones = v | ||
configuredZones, err = volumeutil.ZonesToSet(v) | ||
if err != nil { | ||
return "", 0, nil, "", fmt.Errorf("error parsing zones %s, must be strings separated by commas: %v", v, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error message seems to belong more inside the "ZonesToSet" method since its pretty specific, then it can just bubble up to here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the error to ZonesToSet
. Will adjust the error message in other consumers (like EBS/Azure) in a separate PR.
pkg/volume/util/util.go
Outdated
} | ||
|
||
// pick zone from allowedZones if specified | ||
allowedZones, err := ZonesFromAllowedTopologies(allowedTopologies) | ||
if err != nil { | ||
return "", err | ||
return sets.NewString(), err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when returning error I think you can just return nil
for the sets.String
value
pkg/volume/util/util.go
Outdated
} | ||
return ChooseZoneForVolume(*allowedZones, pvcName), nil | ||
return chooseZonesForVolumeIncludingZone(*allowedZones, pvcName, zoneFromNode, "allowedTopologies", numReplicas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we dereferencing allowedZones
here. I'm not sure we need to pass in "allowedTopologies" here, its only used in an error message in the function. But if you absolutely have to have it, make it a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clearly missed the fact that nil
can be used to indicate an empty set. Thanks for pointing that above. With that, no dereferencing is necessary and I will fix the code around this.
I was using the label to be able to print out meaningful error messages without having to process the errors in the callers. However I am not sure that saves me much - so I will get rid of this label and error string inside chooseZonesForVolumeIncludingZone
and process the errors in the callers.
@@ -1460,6 +1460,659 @@ func TestSelectZoneForVolume(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestSelectZonesForVolume(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add unit tests for: chooseZonesForVolumeIncludingZone
/test pull-kubernetes-integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass
pkg/volume/gce_pd/gce_pd.go
Outdated
if len(labels) != 0 { | ||
if pv.Labels == nil { | ||
pv.Labels = make(map[string]string) | ||
} | ||
for k, v := range labels { | ||
pv.Labels[k] = v | ||
requirements = append(requirements, v1.NodeSelectorRequirement{Key: k, Operator: v1.NodeSelectorOpIn, Values: []string{v}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For regional PD, the label value would be something like us-central1-a__us-central1-b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this.
pkg/volume/gce_pd/gce_pd.go
Outdated
@@ -526,7 +527,17 @@ func (c *gcePersistentDiskProvisioner) Provision(selectedNode *v1.Node, allowedT | |||
} | |||
for k, v := range labels { | |||
pv.Labels[k] = v | |||
requirements = append(requirements, v1.NodeSelectorRequirement{Key: k, Operator: v1.NodeSelectorOpIn, Values: []string{v}}) | |||
var values []string | |||
if k == kubeletapis.LabelZoneFailureDomain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this have a unit test?
pkg/volume/gce_pd/gce_pd.go
Outdated
} | ||
} | ||
|
||
if utilfeature.DefaultFeatureGate.Enabled(features.VolumeScheduling) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check whether requirements
is empty
var activezones sets.String | ||
activezones, err = cloud.GetAllCurrentZones() | ||
if err != nil { | ||
return "", 0, nil, "", err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Keep the same glog message as before: glog.V(2).Infof("error getting zone information from GCE: %v", err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a review comment earlier to remove the log as the error already captures it. #67530 (comment) . Do we really need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, we should avoid logging error messages that we also return because that creates duplicate log messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
} | ||
|
||
// pick zone from parameters if present | ||
if zoneParameterPresent { | ||
return zoneParameter, nil | ||
if numReplicas > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The volumeutil.ValidateZone()
logic is missing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary? All ValidateZone
is doing is making sure the string does not have spaces/tabs. The cloud API call for PD creation will fail for an invalid zone like that eventually. Maybe I can invoke ValidateZone
on the list of returned zones from SelectZonesForVolume
if it is important.
pkg/volume/gce_pd/gce_util.go
Outdated
if zonePresent && zonesPresent { | ||
return "", 0, nil, "", fmt.Errorf("the 'zone' and 'zones' StorageClass parameters must not be used at the same time") | ||
} | ||
|
||
if replicationType == replicationTypeRegionalPD && zonePresent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is no longer needed, because it's captured by line 388 in pkg/volume/util/util.go, inside SelectZonesForVolume()
?
pkg/volume/util/util.go
Outdated
} | ||
|
||
// pick zone from parameters if present | ||
if zoneParameterPresent { | ||
return zoneParameter, nil | ||
if numReplicas > 1 { | ||
return sets.NewString(), fmt.Errorf("zone cannot be specified if desired number of replicas for pv is greather than 1. Please specify zones or allowedTopologies to specify desired zones") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil to be consistent
pkg/volume/util/util.go
Outdated
// chooseZonesForVolumeIncludingZone is a wrapper around ChooseZonesForVolume that ensures zoneToInclude is chosen | ||
func chooseZonesForVolumeIncludingZone(zones sets.String, pvcName, zoneToInclude string, numReplicas uint32) (sets.String, error) { | ||
if uint32(zones.Len()) < numReplicas { | ||
return sets.NewString(), fmt.Errorf("not enough zones found to provision a volume with %d replicas. Need at least %d distinct zones for a volume with %d replicas", numReplicas, numReplicas, numReplicas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil
to be consistent with the rest of SelectZonesForVolume()
.
if uint32(zones.Len()) == numReplicas { | ||
return zones, nil | ||
} | ||
if zoneToInclude != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if zoneToInclude
is not in zones
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this function assumes zoneToInclude
is always in zones
, it'd be helpful to call it out in comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here makes sure zoneToInclude
and zones
is mutually exclusive. If zoneToInclude
is set, zones
should not be specified and that is what the logic here detects and error below calls out. To specify a list of "allowed" zones, the error also instructs that allowedTopologies
should be specified with the list of allowed zones since that is integrated into the scheduler while zones
is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you are referring to a different function? Both calls to choozeZonesForVolumeIncludingZone()
pass in non-empty zones
and they both allow for the possibility of non-empty zoneToInclude
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I was indeed referring to a different function. Agree with your comment above - choozeZonesForVolumeIncludingZone()
does expect zoneToInclude
to be part of zones
and I will clarify that in a comment.
pkg/volume/util/util.go
Outdated
return "", fmt.Errorf("zone[s] cannot be specified in StorageClass if allowedTopologies specified") | ||
return nil, fmt.Errorf("zone[s] cannot be specified in StorageClass if allowedTopologies specified") | ||
} | ||
if zones, err := chooseZonesForVolumeIncludingZone(allowedZones, pvcName, zoneFromNode, numReplicas); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate that zoneFromNode
is inside allowedZones
before this call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary since the scheduler guarantees that. I can call out the assumption explicitly in a comment.
} | ||
// if single replica volume and node with zone found, return immediately | ||
if numReplicas == 1 { | ||
return sets.NewString(zoneFromNode), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zoneFromNode
could be outside allowed zones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you are referring to as allowed zones here? If you are referring to the zones
parameter, then zonesParameterPresent
need to be set and we check and make sure zonesParameterPresent
is not specified in this code path up above in L349 (since the zones parameter is not integrated well with the scheduler). Unit test Node_with_Zone_labels_and_Zones_parameter_present
covers this as well.
If you are referring to zones from allowedTopologies
, then the scheduler guarantees the node selected has a zone (i.e. zoneFromNode
) that is part of allowedTopologies
. Maybe I can add a comment here to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was referring to allowedTopologies
. Thanks for the explanation!
// SelectZonesForVolume selects zones for a volume based on several factors: | ||
// node.zone, allowedTopologies, zone/zones parameters from storageclass, | ||
// zones with active nodes from the cluster. The number of zones = replicas. | ||
func SelectZonesForVolume(zoneParameterPresent, zonesParameterPresent bool, zoneParameter string, zonesParameter, zonesWithNodes sets.String, node *v1.Node, allowedTopologies []v1.TopologySelectorTerm, pvcName string, numReplicas uint32) (sets.String, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is it possible to do without some of these parameters? If we can reduce the input set, it'll help guarantee test coverage and possibly improve code maintainability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since a lot of these parameters are mutually exclusive to each other, it might be worth having a SelectZonesForVolume
that only takes in one of these params called zones
, and a wrapper function that does all this logic to determine mutual exclusivity as well as picking which of the zone|zones|topology
to pass to SelectZonesForVolume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@verult The idea behind SelectZonesForVolume
is all the in-tree providers like EBS, Azure and GCE PD should be able to parse out volume zone and topology parameters from the storage class, invoke SelectZoneForVolume
/SelectZonesForVolume
and get a (set of) volume(s) back to pass to the cloud provider API without having to duplicate any of the logic to enforce the rules. So SelectZonesForVolume
is the wrapper that implements all the common logic to determine mutual exclusivity and for that it needs the full set of parameters around zone/zones/node/topology. I agree that the long list of parameters is not pretty but to help with maintenance, we have an extensive set of unit tests in TestSelectZonesForVolume
and TestSelectZoneForVolume
. Those should make sure none of the rules around exclusivity of the parameters are broken.
@davidz627 SelectZonesForVolume
is the wrapper that enforces the compatibility rules among the parameters. chooseZonesForVolumeIncludingZone
/ChooseZonesForVolume
invoked from SelectZonesForVolume
does the actual picking of zone(s) from a set of zones. Let me know if it makes sense and already aligns with the above suggestion.
pkg/volume/gce_pd/gce_pd.go
Outdated
if k == kubeletapis.LabelZoneFailureDomain { | ||
zones, err := util.LabelZonesToSet(v) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to convert label string for Zone: %s to a Set", v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrap error here
pkg/volume/gce_pd/gce_pd.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("failed to convert label string for Zone: %s to a Set", v) | ||
} | ||
values = zones.List() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we go straight from label to list here instead of going through an intermediate set?
// SelectZonesForVolume selects zones for a volume based on several factors: | ||
// node.zone, allowedTopologies, zone/zones parameters from storageclass, | ||
// zones with active nodes from the cluster. The number of zones = replicas. | ||
func SelectZonesForVolume(zoneParameterPresent, zonesParameterPresent bool, zoneParameter string, zonesParameter, zonesWithNodes sets.String, node *v1.Node, allowedTopologies []v1.TopologySelectorTerm, pvcName string, numReplicas uint32) (sets.String, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since a lot of these parameters are mutually exclusive to each other, it might be worth having a SelectZonesForVolume
that only takes in one of these params called zones
, and a wrapper function that does all this logic to determine mutual exclusivity as well as picking which of the zone|zones|topology
to pass to SelectZonesForVolume
pkg/volume/util/util.go
Outdated
if zones, err := stringToSet(zonesString, ","); err != nil { | ||
return nil, fmt.Errorf("error parsing zones %s, must be strings separated by commas: %v", zonesString, err) | ||
} else { | ||
return zones, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: doesn't have to be in else
block
pkg/volume/util/util.go
Outdated
// pick node's zone and ignore any allowedTopologies (since scheduler is assumed to have accounted for it already) | ||
z, ok := node.ObjectMeta.Labels[kubeletapis.LabelZoneFailureDomain] | ||
// pick node's zone for one of the replicas | ||
zoneFromNode, ok := node.ObjectMeta.Labels[kubeletapis.LabelZoneFailureDomain] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is zoneFromNode
only being set inside the scope of this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is the only place zoneFromNode
is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant that by using :=
, I think zoneFromNode
is redeclared inside the block and the variable outside the if
branch isn't actually being set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This is something I clearly missed. I was incorrectly thinking that :=
will just initialize the ok
and honor zoneFromNode
from the broader scope. But as you mentioned, zoneFromNode
gets reinitialized due to :=
. The unit-test to detect this was also getting lucky it seems - will fix that too.
Addressed all code review feedback so far. |
pkg/volume/util/util.go
Outdated
@@ -539,6 +604,33 @@ func ChooseZoneForVolume(zones sets.String, pvcName string) string { | |||
return zone | |||
} | |||
|
|||
// chooseZonesForVolumeIncludingZone is a wrapper around ChooseZonesForVolume that ensures zoneToInclude is chosen | |||
// zoneToInclude can either be empty in which case it is ignored. If non-empty, zoneToInclude is expected to be member of zones. | |||
// numReplicas is expected to be > 0 and >= zones.Len() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"<= zones.Len()"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final nits from me, lgtm otherwise!
pkg/volume/gce_pd/gce_pd_test.go
Outdated
@@ -91,6 +96,16 @@ func (fake *fakePDManager) DeleteVolume(cd *gcePersistentDiskDeleter) error { | |||
return nil | |||
} | |||
|
|||
func getNodeSelectorRequirementWithKey(key string, term v1.NodeSelectorTerm) (*v1.NodeSelectorRequirement, error) { | |||
for _, r := range term.MatchExpressions { | |||
if r.Key != key { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use positive expressions when possible, especially here this feels a little backwards
instead, maybe:
if r.Key == key{
return &r, nil
}
pkg/volume/util/util.go
Outdated
// pick node's zone and ignore any allowedTopologies (since scheduler is assumed to have accounted for it already) | ||
z, ok := node.ObjectMeta.Labels[kubeletapis.LabelZoneFailureDomain] | ||
// pick node's zone for one of the replicas | ||
ok := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: var ok bool
/test pull-kubernetes-node-e2e |
/lgtm |
/test pull-kubernetes-node-e2e |
/lgtm |
Can you squash your commits? |
Signed-off-by: Deep Debroy <[email protected]>
squashed commits |
CC @verult Please ensure |
@saad-ali |
/lgtm |
/test pull-kubernetes-e2e-kops-aws |
@ddebroy yep it should still work, but since eventually we plan to use NodeAffinity as the main way to specify topology, |
Otherwise LGTM. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidz627, ddebroy, saad-ali, verult The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
For backwards compatibility, we cannot replace the labels check until the labels method is fully removed. |
/test all [submit-queue is verifying that this PR is safe to merge] |
We intentionally keep the old labeling + NodeAffinity for this backwards compatibility reason. |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
@msau42 right, we could check using both NodeAffinity and labels, prioritizing NodeAffinity |
What this PR does / why we need it:
This PR adds support for the DynamicProvisioningScheduling feature for GCE PD and RePD. With this in place, if VolumeBindingMode: WaitForFirstConsumer is specified in a GCE storageclass and DynamicProvisioningScheduling is enabled, GCE PD provisioner will use the selected node's LabelZoneFailureDomain as (1) the zone to provision a GCE PD volume in (2) one of the zones to provision GCE RePD volume in.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
E2E tests for DynamicProvisioningScheduling scenarios for GCE PD to follow
Release note:
/sig storage
/assign @msau42