-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug & proposal] overlayfs support more-layers-image #2497
Comments
Ideally we should reexec+chdir but I fear it breaks API compatibility. Can we keep the current API and add an overlayfs-specific chdir hack to https://github.com/containerd/containerd/blob/master/mount/mount_linux.go ? |
Agree with the API compatibility point. I think we can use
Doable but hacker. 😄 |
That would make containerd v1.1 client unaccessible to containerd v1.2 daemon. |
I guess it'll be a while before it is generally available enough for us to rely on, but it seems there is work afoot in kernel-land which would remove the 4k limit to mount options: Six (or seven) new system calls for filesystem mounting. |
The client just sends the mount info into |
That's true if the client just let the daemon |
Thanks for bringing this up, this was next on my 1.2 TODO of issues to look at My plan was to implement this functionality entirely in the
I ordered those by which I think is the easiest/best options. I think reexec is the worst option and would like to avoid it if possible (although did see some code somewhere that was able to successfully perform a real fork in Go). The second option is a bit confusing, but it would allow the deepest support for overlay by leveraging overlays support for having overlay mount in the |
@dmcgowan Thank you for the detailed explanation! Both Symlink and 2 layer mount are better than re-exec. But they also causes the burden about cleanup the tmpdir or symlinks after However, we don't need to chdir all the time. Skip the |
@fuweid using Docker's re-exec is not an option. We don't pull in pkg from
The mount package would also be responsible for taking care of this |
@dmcgowan I will try the Gvisor way. Thank you! |
Isn't the need for It's now safe to |
There is still no way to prevent the |
Thanks @dmcgowan , @ijc and @AkihiroSuda for the help! |
There is still a limit of number of layers around 260 according to my test. |
Bug
Description
The number of layers is limited by the max size of the mount option buffer in the kernel (1 page/4096bytes in general). For now, containerd uses the absolute path of snapshoter. Basically, the
root
path is like/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots
[68]. We cannot pull image which has almost 60 layers (~ 4096/68).Reproduce
ctr pull
${imageName}
and you will got:containerd version
Proposal
I have two proposals for this issue:
symlinksnapshots
dir by symbolCreate link the
snapshots
dir, like/tmp/ctrdl
, tocompact
thelowerdir
option. However, the tmp link is out of control, like clean up bytmpwatch
. It's also hard to maintain the link during start/stop/start...use
reexec
to change work dir before mountLike moby, use
reexec
to fork process to do mount thing. Since the snapshoter service provides themount
option, no mount action, I want to change thegithub.com/containerd/containerd/mount
behavour, likeThe
func (m *Mount) Mount
will usereexec
if theChdir
is not empty. For the overlayfs, the containerd will change work dir intosnapshots
and mount the layers, likeThe max of snapshot id will take 20 digits so that it can support more than 128 layers for overlayfs.
Since
snapshot service
,task service
andruntime/shim service
are consuming the mount option, this proposal will change the proto file, too.It seems that
reexec
can handle more layers in overlayfs well. However, it need to change the API and involvereexec
behaviour like moby. Does it make senses?ping @dmcgowan and @AkihiroSuda
The text was updated successfully, but these errors were encountered: