Skip to content

Use fsmount API to avoid PAGE_SIZE limit for erofs#12783

Merged
fuweid merged 1 commit intocontainerd:mainfrom
ChengyuZhu6:mount
Jan 20, 2026
Merged

Use fsmount API to avoid PAGE_SIZE limit for erofs#12783
fuweid merged 1 commit intocontainerd:mainfrom
ChengyuZhu6:mount

Conversation

@ChengyuZhu6
Copy link
Copy Markdown
Member

The traditional mount() syscall has a PAGE_SIZE (typically 4KB) limit for mount options. Use the new mount API (fsopen/fsconfig/fsmount/ move_mount) introduced in Linux 5.2 to bypass this limitation.

Fixed: #12662

Comment thread plugins/mount/erofs/plugin_linux.go Outdated
Comment thread plugins/mount/erofs/plugin_linux.go Outdated
Comment thread plugins/mount/erofs/plugin_linux.go Outdated
@hsiangkao
Copy link
Copy Markdown
Member

Try to cc @halaney @rata @azr for visibility and the common part for the new mount apis

optionsBuilder.WriteString("ro")

for i := 0; i < 100; i++ {
fakeDevice := filepath.Join(tempDir, strings.Repeat("x", 50), "fake_device_"+strings.Repeat("0", 10))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea looks good.

Just sharing some context: there is still a limitation here

I tried using fsopen 2–3 years ago for OverlayFS mounting, but ran into a restriction with key=value parsing in the kernel (see: https://elixir.bootlin.com/linux/v6.12.6/source/fs/fsopen.c#L429). When setting a string option, both the key and the value must be less than 256 bytes. However, OverlayFS options like lowerdir=xx:yy:zz:... can easily exceed this limit.

Maybe I used it incorrectly, but it does look like there is a hard limit on the key/value length.

Copy link
Copy Markdown
Member

@hsiangkao hsiangkao Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @fuweid , overlayfs introduces a new representation for new mount APIs on the issue you mentioned, you could check out the latest overlayfs kernel doc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

especially this part

Since kernel version v6.8, directory names containing colons can also be configured as lower layer using the “lowerdir+” mount options and the fsconfig syscall from new mount api. For example:

fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir+", "/a:lower::dir", 0);

although I agree it needs more recent kernel versions.

As for erofs, we pass multiple devices in multiple device= arguments rather than overlayfs way splited in : so it won't have a hard issue (although 256 is shorter than PATH_MAX.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even newer versions support fd passing, as I said in #11354 (comment).
It could be resolved with mount manager expression like {{ overlay 0 n-1 }} {{mountpoint 0}} for example.

EROFS doesn't have the passing fd feature, I could form one later.

Copy link
Copy Markdown
Member

@fuweid fuweid Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since kernel version v6.8, directory names containing colons can also be configured as lower layer using the “lowerdir+” mount options and the fsconfig syscall from new mount api. For example:

Nice!. When I try this API, the latest kernel is still starting from 5.X 😂

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for erofs, we pass multiple devices in multiple device= arguments rather than overlayfs way splited in :

👍 +1 for this api

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuweid besides what @hsiangkao said, there is also another way to go around that, I think: overlayfs supports since 6.15 to use fds for lower dirs. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ff053b98a0f039e52c2bd8d0cb38f2831edfaf5.

So the path length is not an issue if we use fds. It can definitely help in the idmap case too, as we don't need to mount the idmapped lower layers :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we can skip idmapped lower layers here. But it's good move to use open_tree to handle 4096 case without changing chdir in go runtime.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the review. As @hsiangkao mentioned, erofs use multiple 'device=' options individually rather than one long string. Since each path is well under 256 bytes, we avoid the limit while successfully bypassing the original PAGE_SIZE restriction.

}

// Fsopen opens a filesystem context for configuration.
func Fsopen(fsName string, flags int) (*os.File, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: once we use Fsopen, we should use mount.Umount to umount that mountpoint. just in case that fd is hold by some new process - REF: f40bfc4

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. erofsMountHandler.Unmount is indeed using mount.Unmount

@ChengyuZhu6 ChengyuZhu6 moved this from Needs Triage to Needs Reviewers in Pull Request Review Jan 16, 2026
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: we can move this API into pkg/internal first. Eventually, we can export fsmount via core/mount.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

Comment thread core/mount/fsmount_linux.go Outdated
// Handle key=value options
if key, val, ok := strings.Cut(o, "="); ok {
if err := unix.FsconfigSetString(int(fsctx.Fd()), key, val); err != nil {
return fmt.Errorf("failed to set option %s=%s: %w", key, val, err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("failed to set option %s=%s: %w", key, val, err)
return fmt.Errorf("failed to set string option %s=%s: %w", key, val, err)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread plugins/mount/erofs/plugin_linux_test.go Outdated
optionsBuilder.WriteString("ro")

for i := 0; i < 100; i++ {
fakeDevice := filepath.Join(tempDir, strings.Repeat("x", 50), "fake_device_"+strings.Repeat("0", 10))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we can skip idmapped lower layers here. But it's good move to use open_tree to handle 4096 case without changing chdir in go runtime.

The traditional mount() syscall has a PAGE_SIZE (typically 4KB) limit
for mount options. Use the new mount API (fsopen/fsconfig/fsmount/
move_mount) introduced in Linux 5.2 to bypass this limitation.

Fixed: containerd#12662

Signed-off-by: ChengyuZhu6 <[email protected]>
@github-project-automation github-project-automation Bot moved this from Needs Reviewers to Review In Progress in Pull Request Review Jan 20, 2026
@hsiangkao
Copy link
Copy Markdown
Member

@containerd/committers ... merge this? cc @fuweid

@fuweid fuweid added this pull request to the merge queue Jan 20, 2026
Merged via the queue into containerd:main with commit 4fb4c9d Jan 20, 2026
52 checks passed
@github-project-automation github-project-automation Bot moved this from Review In Progress to Done in Pull Request Review Jan 20, 2026
@dmcgowan dmcgowan changed the title plugins/mount/erofs: use fsmount API to avoid PAGE_SIZE limit Use fsmount API to avoid PAGE_SIZE limit for erofs Mar 17, 2026
@dmcgowan dmcgowan added impact/changelog area/snapshotters Snapshotters area/storage Image Storage and removed area/snapshotters Snapshotters area/runtime Runtime labels Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

erofs snapshotter: "mount options is too long" with 100 layer image

7 participants