-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add support for overlay volatile option #4785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for overlay volatile option #4785
Conversation
|
Hi @mauriciovasquezbernal. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Build succeeded.
|
|
Nice! Thanks! The umount overlayfs brings so many syncfs, like write amplification. It can reduce disk syncfs when read the user/group ID from image. 👍 |
|
Is this still draft? |
|
@AkihiroSuda the main reason for this to be draft still is because we wanted to know your (or other maintainers) opinion regarding the questions @mauriciovasquezbernal asked in the PR description. Other than that, in the limited testing we did this PR works fine and we see good performance improvements (~20% as mentioned in the commit adding it torvalds/linux@c86243b) If you think this direction is fine (only for the rootfs of containers created via the CRI plugin), we can make it ready quite soon. If you think other direction is best, please let us know and we will adapt :-) Thanks! |
|
@AkihiroSuda does the current approach in the PR sound fine for you? Or do you suggest some other way? |
d37c3e9 to
2b8cefe
Compare
|
Build succeeded.
|
2b8cefe to
141e7df
Compare
|
Build succeeded.
|
|
I discussed internally with @rata and came up with an new implementation that should be cleaner. Now the volatile option is configured at the snapshotter level and it affects all the containers created by containerd, not only the ones created by the CRI plugin. We know a configuration parameter adds complexity, but we don't think we can enable that by default right now (until the remount problems are solved in the kernel). We think this is an useful option and some users would like to enable it. We'd appreciate any feedback. Thanks! |
141e7df to
7a52544
Compare
|
Build succeeded.
|
|
@mauriciovasquezbernal Can you rebase the branch? @fuweid PTAL /ok-to-test |
Linux 5.10 adds support for a new volatile option for overlay mounts that increases performance by avoiding sync calls to the filesystem where the upper directory resides. This commit adds an option to the overlayfs snapshotter to enable this feature. Mounts using the volatile option can't be remounted, for this reason there is some logic in the WithTempMount function to avoid using that option there. Signed-off-by: Mauricio Vásquez <[email protected]>
7a52544 to
63a24ba
Compare
|
@kzys I rebased it! |
|
@mauriciovasquezbernal Thanks! |
samuelkarp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| // option here. | ||
| // TODO: Make this logic conditional once the kernel supports reusing | ||
| // overlayfs volatile mounts. | ||
| for _, m := range mounts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we can remove this part. The caller should validate the volatile option at the beginning.
In overlayFS snapshotter, we can check volatile in NewSnapshotter like https://github.com/containerd/containerd/blob/master/snapshots/overlay/overlayutils/check.go#L110 if set Volatile=true in toml.
If kernel doesn't support that option, just return error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fuweid Hi, I'm a co-worker of Mauricio.
Not sure I follow what you mean. If the kernel doesn't support this option, it will error out when trying to mount with volatile already. Do you suggest to move this code to a function and we just call it from here? Or what is the suggestion? Not sure I follow.
The code here makes sure to not use volatile for temp mounts, because the overlayfs dirs (workdir at least) is reused for the following mounts. But re-using them is not supported by the kernel when you use volatile, so we need to remove the option if not the following mounts that re-use the work-dir will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rata sorry for late reply.
=> If the kernel doesn't support this option, it will error out when trying to mount with volatile already.
I was thinking we can add pre-check option during initializing overlayFS snapshotter, like
// From https://github.com/containerd/containerd/blob/master/snapshots/overlay/overlay.go
//
// NewSnapshotter returns a Snapshotter which uses overlayfs. The overlayfs
// diffs are stored under the provided root. A metadata file is stored under
// the root.
func NewSnapshotter(root string, opts ...Opt) (snapshots.Snapshotter, error) {
...
// figure out whether "index=off" option is recognized by the kernel
var indexOff bool
if _, err = os.Stat("/sys/module/overlay/parameters/index"); err == nil {
indexOff = true
}
// check volatile option
if config.volatile {
if err := overlayutils.SupportVolatile(root); err != nil {
return nil, err
}
}
...
}The pre-check can help us to know the option is supported before running container.
=> But re-using them is not supported by the kernel when you use volatile
Yeah, you are correct. This code in WithTempMount should be kept. In addition, the other case is that container engine uses WithTempMount to get the user id mapping. I think use volatile is good for this case. But I don't have an idea how to use it correctly.
The change looks good to me. I am fine with handling overlayutils.SupportVolatile in the follow-up pr.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fuweid heh, no late response, it was quite fast :)
Ohh, I see. Yes, if we can merge this (one LGTM is missing IIUC) and I'll do a follow-up PR adding that it would be great.
Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I think we might need to probe a dummy mount with the volatile option, not possible to do as we do with indexOff because that is a module parameter. Those are exposed on sysfs, not sure about mount options (think they are not). I smell this logic might end up in snapshots/overlay/overlayutils with similar functions.
Will check and open a new PR in the next days with that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this fail without this code? How is this different from the index option being set?
|
ping @dmcgowan ~ |
| return nil | ||
| } | ||
|
|
||
| // Volatile enables the volatile option of overlayfs to ommit all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omit
What is the expected snapshotter behavior in this case? |
|
@dmcgowan has this been abandoned for an alternative implementation or could this be revived? thanks |
|
I've completely lost track of this. As I didn't open the PR it doesn't show me when looking for those, and thoes notification arrived while I was moving my emails so I probably lost them. Sorry! @natefive if you want to work on this, I'd be happy to help if I can. Otherwise, I can take this with lower prio (I'm working on user namespaces support in containerd first). Do you want to take this work? |
|
@rata happy to give it a try, I may struggle with the testing though from the above I can see:
however I am unclear, what the decided action for this thread #4785 (comment) was? thanks :D I will begin prepping my own PR based on this one - #8402 |
|
@natefive Thanks! We need to solve it, I've not proposed any concrete solution. I mentioned what I thought might be needed (a dummy mount) |
|
I've interpreted "a dummy mount" and previous context to mean add a SupportsVolatileMount func in overlayutils to test if volatile mounts are available https://github.com/containerd/containerd/pull/8402/files#diff-0d4696b29eacc6f2ed899b12982ab6fa275670ee474536e067f8778732211e56 then call this func from the NewSnapshotter func |
|
@natefive Yeap, that is what I had in mind. I don't know if there is anyway to avoid doing that, I don't remember now. Thanks for continuing this! :) |
|
@rata no problem - I guess I'll leave it in your more experienced hands to test it properly (it compiles, the units pass & CI passes with the exception of |
|
Sorry, I'll be afk for several days and was busy with kubecon. I can have a look later, when I'm back, but I don't think there is anything special to look at, if it is creating pods with the overlayfs filesystem mounted as volatile, it should be fine :) |
|
no worries have a good time at kubecon 🎉 |
We already had several discussions about enabling overlayfs mount option `volatile` which skips `fssync` against upperdir's file system. The benefit of taking the overlayfs feature in was clearly stated in those discussions. containerd#4785 containerd#6406 Compared with the aforementioned proposals, this solution is more granular and tries to encapsulate the feature only in CRI. It makes CRI capable of setting up rootfs by itself thus it has a chance to tweak snapshots mounts options And it does not need to change snapshotter's behavior. Fortunately, containerd's container API allows the client to specify rootfs(mounts slice) when `StartContainer`. As the rootfs is passed within TaskCreate request to shim who actually mounts the filesystem as rootfs, I suppose it's pretty safe for other critical parts in containerd's core service. We already convert all temp mounts to "no-upperdir" mounts. Those temp mounts won't result in `fssync`. The UI is to add a new configuration entry for CRI plugin ```toml [plugins."io.containerd.grpc.v1.cri".containerd] overlayfs_volatile = true ``` In the future, if any KEP comes up with enabling certain pod to be marked as `volatile`, we can extend this feature to make it only work for certain pods on the node. Signed-off-by: Changwei Ge <[email protected]>
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
It appears to be a duplicate of this PR https://github.com/containerd/containerd/pull/9555/files ? |
|
@CormickKneey It indeed seems containerd now has this functionality now with the mount options. Let me know if I'm missing something, of course. I'll ask @mauriciovasquezbernal to close this :) |
Linux 5.10 adds support for a new volatile option for overlay mounts that increases performance by avoiding sync calls to the filesystem where the upper directory resides. The goal of this PR is to discuss possible implementations to support this in containerd and also present some challenges regarding the remount when that option is used.
I did some experiments using this PR and Kubernetes and the time required to install emacs on an image with the cached packages drop from ~55s to ~30s on my machine.
Test details
The image I used was created with the following Docker file
A run with
kubectl run --restart=Never --image=mauriciovasquezbernal/testimage:latest m$RANDOMin Kubernetes.The volatile option has the particularity that it's not possible to perform a mount reusing the upper/work directories as they could be out of sync. Sargun Dhillon has sent some patches to relax this requirement in some cases where it's safe to perform such mount, however they probably won't be part of 5.10.
Containerd performs some mounts of the overlayfs with
WithTempMountbefore starting the container to inspect the image, then a mount for the rootfs of the container is done. This makes the support for the volatile option a bit difficult to implement as this option can only be used in the last mount (the root fs of the container).This PR proposes a way to implement it. The volatile option is not set by the snapshotter but it's set in
NewTask, in this way mounts performed withWithTempMountdon't have this option and only the final mount has it. This solution is far from ideal because it breaks the modularity polluting other components with overlay specific options, but it's the less invasive option we've found so far.We've considered the following options to far:
WithTempMountto remove it.WithTempMountWithTempMountwith overlayfs specific logicTempMountsmethod that return the mounts to be used in the temp mounts case. It's be the same asMountin all implementations but in overlayfs it'll return the mounts without the volatile optionI'd like to get more opinions and ideas on this.
cc @rata @alban