Use fsmount API to avoid PAGE_SIZE limit for erofs#12783
Use fsmount API to avoid PAGE_SIZE limit for erofs#12783fuweid merged 1 commit intocontainerd:mainfrom
Conversation
| optionsBuilder.WriteString("ro") | ||
|
|
||
| for i := 0; i < 100; i++ { | ||
| fakeDevice := filepath.Join(tempDir, strings.Repeat("x", 50), "fake_device_"+strings.Repeat("0", 10)) |
There was a problem hiding this comment.
The idea looks good.
Just sharing some context: there is still a limitation here
I tried using fsopen 2–3 years ago for OverlayFS mounting, but ran into a restriction with key=value parsing in the kernel (see: https://elixir.bootlin.com/linux/v6.12.6/source/fs/fsopen.c#L429). When setting a string option, both the key and the value must be less than 256 bytes. However, OverlayFS options like lowerdir=xx:yy:zz:... can easily exceed this limit.
Maybe I used it incorrectly, but it does look like there is a hard limit on the key/value length.
There was a problem hiding this comment.
hi @fuweid , overlayfs introduces a new representation for new mount APIs on the issue you mentioned, you could check out the latest overlayfs kernel doc.
There was a problem hiding this comment.
especially this part
Since kernel version v6.8, directory names containing colons can also be configured as lower layer using the “lowerdir+” mount options and the fsconfig syscall from new mount api. For example:
fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir+", "/a:lower::dir", 0);
although I agree it needs more recent kernel versions.
As for erofs, we pass multiple devices in multiple device= arguments rather than overlayfs way splited in : so it won't have a hard issue (although 256 is shorter than PATH_MAX.)
There was a problem hiding this comment.
even newer versions support fd passing, as I said in #11354 (comment).
It could be resolved with mount manager expression like {{ overlay 0 n-1 }} {{mountpoint 0}} for example.
EROFS doesn't have the passing fd feature, I could form one later.
There was a problem hiding this comment.
Since kernel version v6.8, directory names containing colons can also be configured as lower layer using the “lowerdir+” mount options and the fsconfig syscall from new mount api. For example:
Nice!. When I try this API, the latest kernel is still starting from 5.X 😂
There was a problem hiding this comment.
As for erofs, we pass multiple devices in multiple device= arguments rather than overlayfs way splited in :
👍 +1 for this api
There was a problem hiding this comment.
@fuweid besides what @hsiangkao said, there is also another way to go around that, I think: overlayfs supports since 6.15 to use fds for lower dirs. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ff053b98a0f039e52c2bd8d0cb38f2831edfaf5.
So the path length is not an issue if we use fds. It can definitely help in the idmap case too, as we don't need to mount the idmapped lower layers :)
There was a problem hiding this comment.
I am not sure if we can skip idmapped lower layers here. But it's good move to use open_tree to handle 4096 case without changing chdir in go runtime.
There was a problem hiding this comment.
Thank you both for the review. As @hsiangkao mentioned, erofs use multiple 'device=' options individually rather than one long string. Since each path is well under 256 bytes, we avoid the limit while successfully bypassing the original PAGE_SIZE restriction.
| } | ||
|
|
||
| // Fsopen opens a filesystem context for configuration. | ||
| func Fsopen(fsName string, flags int) (*os.File, error) { |
There was a problem hiding this comment.
Side note: once we use Fsopen, we should use mount.Umount to umount that mountpoint. just in case that fd is hold by some new process - REF: f40bfc4
There was a problem hiding this comment.
Thanks. erofsMountHandler.Unmount is indeed using mount.Unmount
There was a problem hiding this comment.
Suggestion: we can move this API into pkg/internal first. Eventually, we can export fsmount via core/mount.
| // Handle key=value options | ||
| if key, val, ok := strings.Cut(o, "="); ok { | ||
| if err := unix.FsconfigSetString(int(fsctx.Fd()), key, val); err != nil { | ||
| return fmt.Errorf("failed to set option %s=%s: %w", key, val, err) |
There was a problem hiding this comment.
| return fmt.Errorf("failed to set option %s=%s: %w", key, val, err) | |
| return fmt.Errorf("failed to set string option %s=%s: %w", key, val, err) |
| optionsBuilder.WriteString("ro") | ||
|
|
||
| for i := 0; i < 100; i++ { | ||
| fakeDevice := filepath.Join(tempDir, strings.Repeat("x", 50), "fake_device_"+strings.Repeat("0", 10)) |
There was a problem hiding this comment.
I am not sure if we can skip idmapped lower layers here. But it's good move to use open_tree to handle 4096 case without changing chdir in go runtime.
The traditional mount() syscall has a PAGE_SIZE (typically 4KB) limit for mount options. Use the new mount API (fsopen/fsconfig/fsmount/ move_mount) introduced in Linux 5.2 to bypass this limitation. Fixed: containerd#12662 Signed-off-by: ChengyuZhu6 <[email protected]>
|
@containerd/committers ... merge this? cc @fuweid |
The traditional mount() syscall has a PAGE_SIZE (typically 4KB) limit for mount options. Use the new mount API (fsopen/fsconfig/fsmount/ move_mount) introduced in Linux 5.2 to bypass this limitation.
Fixed: #12662