core/mount: Fix doPrepareIDMappedOverlay cleanup function by rata · Pull Request #12114 · containerd/containerd

rata · 2025-07-17T14:37:18Z

This PR fixes:

an issue where we can panic the go runtime by running a function that is nil if we take some error paths (containerd daemon continues to run fine, though)
the cleanup function was also adding noise to the logs on error paths (printing we failed to delete a directory, although we never created it, etc.)
The cleanup function didn't retry unmounting and could end up leaking mounts (containerd leaks many <root>/tmpmounts/ovl-idmapped* mounts until restart #12139)

While we are there, I took the opportunity to add tests for the cleanup function and test these issues, as well as previous issues we encounter, are fixed.

Please see individual commits for more details. I think it's simpler to review by checking each commit too.

PR #5890 introduced the issues in the cleanup function.

Fixes: #12139

k8s-ci-robot · 2025-07-17T14:37:29Z

Hi @rata. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

austinvazquez · 2025-07-17T14:48:05Z

/ok-to-test

rata · 2025-07-17T15:20:28Z

All green except fedora with an unrelated failure. Pushing again to re-run the tests

rata · 2025-07-17T15:20:44Z

@AkihiroSuda @fuweid can you take a look?

halaney

lgtm overall, I like the centralization of who makes/deletes the tmpdir.

halaney

I'll approve after testing again, but similar patch in my tree has worked already so I think its good to go as well.

halaney

I'll approve after testing again, but similar patch in my tree has worked already so I think its good to go as well.

Leaked mounts, be gone! :P When the umount fails due to EBUSY the second one succeeds for me.

rata · 2025-08-14T22:02:35Z

Tests are green! (failures were unrelated to this PR)

doPrepareIDMappedOverlay() can return nil as the cleanup function. As the function mandates for the callback to be called, even when errors are returned, we end-up calling the nil function that of course causes: panic: runtime error: invalid memory address or nil pointer dereference The containerd daemon continues to run fine, though. With the current structure of always calling the cleanup function, we would either need to check which error it is, to call it only on specific errors (ugly) or return a "stub function" that doesn't do anything on those cases (so the call works). None of these options seem very nice. Let's just switch the logic to a pattern more common on go: only need to cleanup if it returned fine. This makes it easier for callers to not do the wrong thing. For this change we need to do the cleanup when returning errors on the function itself, so the user doesn't need to do it for us. Signed-off-by: Rodrigo Campos <[email protected]>

rata · 2025-08-18T12:59:00Z

Rebased and all tests are green. @fuweid @AkihiroSuda can you please take a look?

fuweid · 2025-08-18T15:04:14Z

 			cleanup()

-			_, err = os.Stat(remountsLocation)
+			err = os.Remove(remountsLocation)


could you please add more information why we need to change this?

The original idea is to verify that remountsLocation should be deleted after called cleanup, if there is no failpoint.
But in this change, you manually remove it. we couldn't figure it out who removes it.

This is explained in the commit message that does that change :)

But it's because now that we move the os.MkdirTemp() from prepareIDMappedOverlay() to doPrepareIDMappedOverlay(). This makes it simpler to reason and not leak dirs (before it was created in prepareIDMappedOverlay() but removed by the cleanup of doPrepareIDMappedOverlay(), although not always as you couldn't always call that cleanup function. It was messy).

But now that everything is done by that function, we expect this directory to be empty in the "all is fine" case. Therefore, the os.Remove should work if it's empty. If we are in the "err returned" case, we expect the remove to return an error, as there should be directories inside it. This is what the if tc.injectUmountFault below is doing.

This is what I wrote in the commit msg, let me know if it's not clear:

This patch on one hand moves the creation of the tmpDir into the function that will remove it later and renames the parameter. On the other hand, it also separates the cleanup function in two functions: cleanDir() and cleanMount(). Each one will clean the directory or the mount, only if needed, and a new cleanup function that just calls cleanDir() and cleanMount() is returned as the cleanup function handler. Because we now create the tmp directory inside the function, we need to adjust the test: the directory passed as param now will exist, but it shoild be empty. Therefore, we change the os.Stat() to an os.Remove(). If it's not empty (the cleanup didn't work), the remove will fail.

Does it make sense?

fuweid · 2025-08-18T15:22:21Z

+		{
+			name:                    "cleanup callback short failure",
+			callbackFailure:         true,
+			callbackFailureDuration: unmountRetryDelay * 2, // Duration of 2x unmountRetryDelay means it will fail for about two retries.


based on my experience, the CI env is not stable and the CI provider could be overcommitted to run multiple jobs in one node. Once that node is busy, it's easy to be flaky. I don't know how to handle it. may we can implement specific umountWithRetry for IdMapping (it could have more retry) so that we can verify that retry logic.

Ohh, great point. I created a channel that we only send in tests and only when opt-in for this "visibility".

Not sure the complexity is worth it, though, I think removing the tests might be simpler.

What do you think?

To simplify, I removed those tests. Here is a diff in another branch that shows what I did: rata/containerd@rootfs-cleanup-fixes...rata:containerd:rootfs-cleanup-fixes-unmount-tests

I guess we can tackle the test in a follow-up PR once we agree on a strategy?

We can handle it in a follow-up~ Thanks!

@fuweid how do you want to handle this? The diff in the link is a WIP (more things are needed), but it's not really trivial.
I've checked quickly also internal/failpoint, but it's not clear it will be useful for this. Do you know?

I'm not convinced that is worth writing a test, as it will be handled by the mount manager "soon", the wrapper today retries and it's quite a simple function, I don't think other retries are being tested and when you restart containerd any leaked mount is cleaned.

What do you think?

Before this patch, the cleanup was unconditionally trying an unmount, which is not needed if the mount never succeeded in the first place. This causes logs of failed stuff, that should never be there. Also, one function was creating the tmpDir and the nested one removing it (in the cleanup function), which complicates the reasoning about not leaking resources. This patch on one hand moves the creation of the tmpDir into the function that will remove it later and renames the parameter. On the other hand, it also separates the cleanup function in two functions: cleanDir() and cleanMount(). Each one will clean the directory or the mount, only if needed, and a new cleanup function that just calls cleanDir() and cleanMount() is returned as the cleanup function handler. Because we now create the tmp directory inside the function, we need to adjust the test: the directory passed as param now will exist, but it shoild be empty. Therefore, we change the os.Stat() to an os.Remove(). If it's not empty (the cleanup didn't work), the remove will fail. Signed-off-by: Rodrigo Campos <[email protected]>

Signed-off-by: Rodrigo Campos <[email protected]>

@fuwei

When a lot of pods are created all at the same time, the umount fails with EBUSY sometimes. This causes some mounts to be leaked. A way to repro this issue is here: containerd#12139 (comment) While using lsof/fuser it was not possible to get the culprits on time (the mount is busy for a few ms), @fuwei has found what is causing the mount to be busy: containerd#12139 (comment) When we fork in `GetUsernsFD()`, if a call to prepareIDMappedOverlay() is ongoing and has an open fd to the path to idmap, then the unmount callback will fail as the forked process still has an open fd to it. Let's handle the idmap unmounts in the same way other temp mounts are done, using the Unmount() helper, that retries the unmount. This retry fixes it 100% of the times on my system and in the systems of @halaney that reported the issue too. This was originally fixed by containerd#10721 by using a detached mount, but the mount was switched to non-detached again in containerd#10955 and we started to leak mounts. As @fuwei mentioned, using a detached mount is quite invisible for the admin and, therefore, a retry is a better alternative. It seems these umounts are a great candidate for containerd#11303, which will manage the life-cycle of mounts and can handle these retries whenever needed. [1]: To do it, I just run as root "unshare -m", that creates a mntns with private propagation, and then run the containerd daemon. Signed-off-by: Rodrigo Campos <[email protected]>

fuweid

LGTM

github-project-automation Bot added this to Pull Request Review Jul 17, 2025

github-project-automation Bot moved this to Needs Triage in Pull Request Review Jul 17, 2025

k8s-ci-robot added size/L needs-ok-to-test labels Jul 17, 2025

dosubot Bot added area/runtime Runtime kind/bug labels Jul 17, 2025

rata force-pushed the rootfs-cleanup-fixes branch from 2e10ae6 to 937ac52 Compare July 17, 2025 14:45

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Jul 17, 2025

rata force-pushed the rootfs-cleanup-fixes branch from 937ac52 to 4e945b9 Compare July 17, 2025 15:20

rata force-pushed the rootfs-cleanup-fixes branch from 4e945b9 to e6705a2 Compare July 17, 2025 15:50

fuweid self-requested a review July 17, 2025 18:13

halaney reviewed Jul 17, 2025

View reviewed changes

Comment thread core/mount/mount_linux_test.go Outdated

rata force-pushed the rootfs-cleanup-fixes branch from e6705a2 to 2be10e1 Compare July 18, 2025 08:20

rata mentioned this pull request Jul 25, 2025

containerd leaks many <root>/tmpmounts/ovl-idmapped* mounts until restart #12139

Closed

halaney reviewed Aug 7, 2025

View reviewed changes

Comment thread core/mount/mount_linux.go Outdated

halaney approved these changes Aug 7, 2025

View reviewed changes

rata force-pushed the rootfs-cleanup-fixes branch 3 times, most recently from 8a280f0 to 09694d9 Compare August 11, 2025 19:52

rata changed the title ~~core/mount: Fix doPrepareIDMappedOverlay cleanup function~~ core/mount: Fix doPrepareIDMappedOverlay cleanup function and retry temp idmap unmounts Aug 11, 2025

rata force-pushed the rootfs-cleanup-fixes branch from 09694d9 to 456ec54 Compare August 11, 2025 20:46

rata changed the title ~~core/mount: Fix doPrepareIDMappedOverlay cleanup function and retry temp idmap unmounts~~ core/mount: Fix doPrepareIDMappedOverlay cleanup function Aug 11, 2025

rata force-pushed the rootfs-cleanup-fixes branch 3 times, most recently from 56f4467 to 4da6b42 Compare August 12, 2025 19:06

rata force-pushed the rootfs-cleanup-fixes branch from ca86140 to 20826e5 Compare August 14, 2025 21:16

halaney mentioned this pull request Aug 15, 2025

Only idmap once per overlayfs, not per layer #12092

Merged

k8s-ci-robot added the needs-rebase label Aug 17, 2025

rata force-pushed the rootfs-cleanup-fixes branch from 20826e5 to 660b655 Compare August 18, 2025 10:03

k8s-ci-robot removed the needs-rebase label Aug 18, 2025

estesp approved these changes Aug 18, 2025

View reviewed changes

fuweid reviewed Aug 18, 2025

View reviewed changes

rata added 2 commits August 19, 2025 14:58

core/mount: Test cleanup of DoPrepareIDMappedOverlay()

27ba690

Signed-off-by: Rodrigo Campos <[email protected]>

rata force-pushed the rootfs-cleanup-fixes branch 5 times, most recently from 0275aec to b3ed66c Compare August 19, 2025 14:24

rata requested a review from fuweid August 19, 2025 14:31

rata force-pushed the rootfs-cleanup-fixes branch from b3ed66c to da3dc1e Compare August 19, 2025 15:38

fuweid approved these changes Aug 20, 2025

View reviewed changes

github-project-automation Bot moved this from Needs Triage to Review In Progress in Pull Request Review Aug 20, 2025

mxpv approved these changes Aug 20, 2025

View reviewed changes

mxpv added this pull request to the merge queue Aug 20, 2025

Merged via the queue into containerd:main with commit 3df1782 Aug 20, 2025
93 of 96 checks passed

github-project-automation Bot moved this from Review In Progress to Done in Pull Request Review Aug 20, 2025

rata deleted the rootfs-cleanup-fixes branch August 20, 2025 09:56

fuweid added cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch cherry-picked/2.1.x PR commits are cherry picked into the release/2.1 branch labels Aug 26, 2025

Conversation

rata commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jul 17, 2025

Uh oh!

austinvazquez commented Jul 17, 2025

Uh oh!

rata commented Jul 17, 2025

Uh oh!

rata commented Jul 17, 2025

Uh oh!

halaney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

halaney left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

halaney left a comment

Choose a reason for hiding this comment

Uh oh!

rata commented Aug 14, 2025

Uh oh!

rata commented Aug 18, 2025

Uh oh!

Uh oh!

fuweid Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

rata Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fuweid Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

rata Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rata Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

fuweid Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

rata Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fuweid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rata commented Jul 17, 2025 •

edited

Loading

rata Aug 19, 2025 •

edited

Loading

rata Aug 19, 2025 •

edited

Loading