-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new VolumeAttachment
API Object
#54463
Introduce new VolumeAttachment
API Object
#54463
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I miss the object registration in pkg/registry/storage/volumeattachment.. IMO it should be in this PR - the API object is unusable without it.
allErrs = append(allErrs, field.Invalid(fldPath, nodeName, msg)) | ||
} | ||
return allErrs | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be also ValidateVolumeAttachmentUpdate
that makes sure that Spec can't be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed now few lines below.
@kubernetes/api-reviewers @kubernetes/sig-storage-api-reviews |
pkg/apis/storage/types.go
Outdated
// Indicates the volume is successfully attached. | ||
// This field must only be set by the entity completing the attach | ||
// operation, i.e. the external-attacher. | ||
IsAttached bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple things to consider from the API conventions:
- Think twice about bool fields. Many ideas start as boolean but eventually trend towards a small set of mutually exclusive options. Plan for future expansions by describing the policy options explicitly as a string type alias (e.g. TerminationMessagePolicy).
- The name of a field expressing a boolean property called 'fooable' should be called Fooable, not IsFooable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it seems like we'd need to be able to tell from status whether attach had begun.
Since attach is not instant, there could be a period of time during
- Kubernetes creates VolumeAttachment
- Attacher observes VolumeAttachment, starts attaching
- PV/VolumeAttachment objects are vulnerable to deletion/finalization, since status doesn't indicate the volume is attached or is attaching
- Attacher completes attaching, tries to update status to indicate attached state, object is gone
It seems like it should be possible to tell where we are in the detached -> attaching -> attached -> detaching -> detached
lifecycle by looking at status (and making the attacher do a successful status update from detached -> attaching
before starting the attach operation prevents races trying to attach a volume that is getting deleted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that we do not need "attaching" state, simple presence of the object with (Is)Attached==false
is enough. The same for "detaching", presence of deletetionTimestamp
is enough.
detached
->attaching
This is not possible. VolumeAttachment with deletetionTimestamp
must be deleted first in API server and Kubernetes must create a new instance. Or is it allowed to cancel deletetionTimestamp? It looks odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did not want to introduce state machines with detached -> attaching -> attached -> detaching -> detached because it's not possible to add a new state there later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detached -> attaching
This is not possible
Maybe I used the wrong term. Is unattached -> attaching -> attached -> detaching -> detached
more correct? Don't we need to represent the state where the attacher has observed the request for the attachment and is currently working to attach (so the VolumeAttachment object should not be removed), but it has not yet attached (so the volume is not ready to mount)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to capture the state that attacher has started attaching. A/D controller with time out eventually and send some event to pod. There is a glitch though, there is no way how to tell if external attacher does not run or it's just slow.
pkg/apis/storage/types.go
Outdated
Attacher string | ||
|
||
// The volume to attach. | ||
Volume api.VolumeSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional that VolumeAttachment is a cluster-scoped type that references api.VolumeSource
instead of api.PersistentVolumeSource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It serves as memento of pod's VolumeSource in case the pod is deleted and we need to detach its volumes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no place to indicate the original namespace, pod name or pod uid... is none of that information needed to resolve the volume or referenced secrets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We moved to supporting PersistentVolumes only and we use namespaced SecretReferences there
cc @kubernetes/sig-storage-api-reviews |
pkg/apis/storage/types.go
Outdated
// This field must only be set by the entity completing the attach | ||
// operation, i.e. the external-attacher. | ||
// +optional | ||
AttachError VolumeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be a pointer? kubectl get volumeattachment
shows ugly output when I created an instance with no Status:
apiVersion: storage.k8s.io/v1
kind: VolumeAttachment
metadata:
creationTimestamp: 2017-10-24T16:26:47Z
name: first
namespace: ""
resourceVersion: "341"
selfLink: /apis/storage.k8s.io/v1/volumeattachments/first
uid: 2192999e-b8d8-11e7-b26b-525400224e72
spec:
attacher: csi/example
nodeName: localhost
volume:
hostPath:
path: /tmp
type: ""
status:
attachError:
Message: ""
time: null
detachError:
Message: ""
time: null
isAttached: false
kind: List
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/apis/storage/types.go
Outdated
// This field must only be set by the entity completing the attach | ||
// operation, i.e. the external-attacher. | ||
// +optional | ||
AttachError VolumeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer a status slice similar to Pod Conditions
. It helps track the evolution attachment status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed that and there are rumors that Conditions are bad design and we should not use them for new features. I am not sure who's source of that rumor, @thockin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/apis/storage/types.go
Outdated
// This field must only be set by the entity completing the detach | ||
// operation, i.e. the external-attacher. | ||
// +optional | ||
DetachError VolumeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the VolumeAttachment object used as more like record/status keeping object or used to convey a desired state?
If the latter, then im assuming the detach is triggered when this VolumeAttachment object is deleted. So if an failure happens during the detach, the external-attacher will not have this object to specify the DetachError anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- VolumeAttachment is used for both -
Spec
for the desired state andStatus
for record keeping. - We use finalizers to keep "deleted" VolumeAttachment instances around until associated volume is truly detached from a node. See CSI Volume Plugins in Kubernetes Design Doc community#1258 for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are VolumeAttachment objects 1:1 with PV objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, one volume can be attached to multiple nodes and thus will have multiple VolumeAttachment instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so they are unique by (pv,node)? is that enforced by convention or by pinning the name format to something like <pvName>.<nodeName>
, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's just an convention. IMO, nobody but Kubernetes A/D controller should be allowed to create this object.
pkg/apis/storage/types.go
Outdated
Volume api.VolumeSource | ||
|
||
// The node that the volume should be attached to. | ||
NodeName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the VolumeAttachment object used during a multi-attachments scenario? For example with RO volumes, will there be multiple of these VolumeAttachment objects for every RO "attachment" even if the volume source and node is the same or will there just be one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also store information about pods and mountDir in the case where plugins are not using the attacher-detacher controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volumes, will there be multiple of these VolumeAttachment objects for every RO "attachment" even if the volume source and node is the same or will there just be one?
We do not plan to change current behavior, i.e. a volume is attached to a node only once. So there will be one instance per node-volume pair. Kubelet then mounts the single device to individual pods (using bind-mounts).
Should you also store information about pods and mountDir in the case where plugins are not using the attacher-detacher controller?
I am not sure I understand the question. That's kubelet job and is not related to volume attachment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was alluding to the fact that in non-attachable plugins, kubelet mount() does everything (attach, mountDevice and mount podDir). So my question was if there is a need to store pod information to determine when it is safe to detach (delete VolumeAttach object).
Look like kubelet can determine how many bind mount ref are there during a detach and delete the attachment once there are no bind mount left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VolumeAttachment
is used only for attachable plugins (CSI). See the proposal for handling of CSI plugins that don't support ControllerPublishVolume
call.
And indeed, kubelet has its own means how to track pods that use the volume and it won't let Attach/Detach controller to detach a volume that's still mounted.
dcb2739
to
35f6b5d
Compare
I updated the API to current version in the proposal + added its registration and validation. |
35f6b5d
to
dbcd53a
Compare
c8a78a6
to
df157c6
Compare
We should be judicious with shortnames like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From other doc:
Did we rule out simply storing this as fields on PV?
pkg/apis/storage/types.go
Outdated
// Right now only PersistenVolumes can be attached via external attacher, | ||
// in future we may allow also inline volumes in pods. | ||
// Exactly one member can be set. | ||
type AttachedVolumeSource struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we've swung towards discriminated unions. Given that this has just one item, I suppose we can forego the discriminator for now, but as soon as we add a second option, we need it. We can add a type
field that defaults to "PersistentVolume" when we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
|
||
// Spec is read-only | ||
if !apiequality.Semantic.DeepEqual(old.Spec, new.Spec) { | ||
allErrs = append(allErrs, field.Invalid(field.NewPath("spec"), new.Spec, "spec is immutable after creation")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should just say "field is immutable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing
@caesarxuchao |
Ack. Removing.
Mostly because it makes detaching trickier. |
2697fcb
to
1744587
Compare
[MILESTONENOTIFIER] Milestone Pull Request Needs Approval @childsb @saad-ali @thockin @kubernetes/sig-storage-misc Action required: This pull request must have the Pull Request Labels
|
This is a ludicrous amount of work to add a new API. Absolutely insane. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: saad-ali, thockin Associated issue: 178 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments and then the apimachinery related code lgtm.
} | ||
|
||
func (volumeAttachmentStrategy) AllowUnconditionalUpdate() bool { | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want this to be true?
It will defeat the optimistic concurrency built around resourceVersion. See
kubernetes/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go
Lines 574 to 579 in 02b3928
if doUnconditionalUpdate { | |
// Update the object's resource version to match the latest | |
// storage object's resource version. | |
err = e.Storage.Versioner().UpdateObject(obj, res.ResourceVersion) | |
if err != nil { | |
return nil, nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Changed it to false.
} | ||
|
||
func (volumeAttachmentStrategy) ValidateUpdate(ctx genericapirequest.Context, obj, old runtime.Object) field.ErrorList { | ||
errorList := validation.ValidateVolumeAttachment(obj.(*storage.VolumeAttachment)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a variable and assign obj.(*storage.VolumeAttachment)
to it, to save another type assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// TODO: move SchemeBuilder with zz_generated.deepcopy.go to k8s.io/api. | ||
// localSchemeBuilder and AddToScheme will stay in k8s.io/kubernetes. | ||
SchemeBuilder = runtime.NewSchemeBuilder(addKnownTypes) | ||
localSchemeBuilder = &SchemeBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you copied the pattern from some other files, but still please remove the localSchemeBuilder
, it's unnecessary. Also please remove the TODO at line 37.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Introduce the v1alpha1 version to the Kubernetes storage API. And add a new VolumeAttachment object to that version. This object will initially be used only by the new CSI Volume Plugin. Eventually existing volume plugins can be refactored to use it too.
@caesarxuchao feedback addressed, PTAL. |
865a3be
to
9f294c1
Compare
@saad-ali: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
apimachinery part lgtm. |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
With hack/local-up-cluster.sh, I don't see storage.k8s.io/v1alpha1 API:
Do I need to enable it somewhere? |
alpha API groups are not enabled by default. you can enable it with |
Hi, I don't know why CI didn't find this, But I get this error:
|
ref: |
Automatic merge from submit-queue (batch tested with PRs 57266, 58187, 58186, 46245, 56509). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. e2e: CSI Tests **What this PR does / why we need it**: This e2e test tests the CSI external attacher with a mock CSI plugin driver. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: kubernetes/enhancements#178 **Special notes for your reviewer**: * Tests features in kubernetes/enhancements#178 * Tests implementation of kubernetes/community#1258 * Tests VolumeAttachment Object: #54463 **Release note**: ```release-note NONE ```
What this PR does / why we need it:
Introduce a new
VolumeAttachment
API Object. This object will be used by the CSI volume plugin to enable external attachers (see design here. In the future, existing volume plugins can be refactored to use this object as well.Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): Part of issue kubernetes/enhancements#178Special notes for your reviewer:
None
Release note: