Retry unmount on EBUSY and return errors#1868
Conversation
|
@dmcgowan @Random-Liu @yanxuean Please take a look at this fix to ensure unmount works or we get an error. I remember in the past cadvisor causing things like this as its crawing the rootfs of mounts. If cadvisor is running on the CI this could be why we are seeing it here. The logic for unmounting is in the |
e380ba4 to
fc85e7a
Compare
This is another WIP to fix containerd#1785. Signed-off-by: Michael Crosby <[email protected]>
fc85e7a to
b0ca685
Compare
Codecov Report
@@ Coverage Diff @@
## master #1868 +/- ##
==========================================
- Coverage 49.18% 49.07% -0.12%
==========================================
Files 86 86
Lines 8244 8263 +19
==========================================
Hits 4055 4055
- Misses 3519 3538 +19
Partials 670 670
Continue to review full report at Codecov.
|
| err = filepath.Walk(root, incrementFS(root, uid, gid)) | ||
| if uerr := mount.Unmount(root, 0); err == nil { | ||
| err = uerr | ||
| } |
There was a problem hiding this comment.
In the case where both error do you think it would make sense to log.Warn() the unmount error? Or is the error message likely to be basically the same as the original walk error and doesn't provide any extra information?
| return nil | ||
| } | ||
|
|
||
| func unmount(target string, flags int) error { |
There was a problem hiding this comment.
This doesn't actually fix the issue. It just reduce the chance, right?
Should we still continue os.RemoveAll when unmount fails?
There was a problem hiding this comment.
We have had this type of logic in the past for unmounting, EBUSY/EAGAIN means "try again" so its still an other all valid fix for the type of issue.
@dmcgowan suggested that we change RemoveAll to Remove so that we either leak a temp dir if it fails but not RM snapshot data.
|
Can we support |
|
@Random-Liu ya, was just pushing the change now |
Signed-off-by: Michael Crosby <[email protected]>
b8231cf to
a8b543f
Compare
|
LGTM Support for timeout and additional logging sound good, but can be done in follow up. Let's get this merged so we can verify the fix. |
| return emptyDesc, errors.Wrap(err, "failed to create temporary directory") | ||
| } | ||
| defer os.RemoveAll(dir) | ||
| defer os.Remove(dir) |
There was a problem hiding this comment.
we should output unmount err and return error When unmount fail .
This is another WIP to fix #1785.
Signed-off-by: Michael Crosby [email protected]