Check that cgroup is empty before deleting#228
Conversation
The kernel will block an attempt to rmdir a cgroup path that still has running processes in it. Since the removal code uses the standard `os.RemoveAll` function, the Go runtime will helpfully fall back to an `rm -rf`-like removal algorithm if the cheap `unlink`/`rmdir` attempts fail. Since cgroup files cannot be removed, the caller ends up with an unhelpful "unlinkat /sys/fs/cgroup/.../cgroup.events: operation not permitted" error message that doesn't actually give any actionable information because the original error was swallowed by the runtime. This changes the `Delete` functions to detect cgroups with still running processes and return an error indicating that removal cannot be done. Signed-off-by: Josh Seba <[email protected]>
Signed-off-by: Josh Seba <[email protected]>
| return err | ||
| } | ||
| if len(processes) > 0 { | ||
| return fmt.Errorf("cgroups: unable to remove path %q: still contains running processes", c.path) |
There was a problem hiding this comment.
Is it possible to have a similar message on cgroup.go?
There was a problem hiding this comment.
The v1 Delete just accumulates a list of subsystems that couldn't be removed and doesn't report the error. I could make the append more like:
errs = append(errs, fmt.Sprintf("%s (contains running processes)", string(s.Name()))
be a solution? Then the resulting error would be akin to
cgroups: unable to remove paths memory (contains running processes), cpu (contains running processes)
There was a problem hiding this comment.
Apologies for not getting back to this sooner, I added the "contains running process" reason to the errors list in cgroup.go
Signed-off-by: Josh Seba <[email protected]>
|
If someone gets a chance to take a look at this again, I'd appreciate it! It bubbled up to the top of our ticket tracker to remind me to check on it. If there's any blockers let me know! |
| if err := remove(path); err != nil { | ||
| errs = append(errs, path) | ||
| } | ||
| continue |
There was a problem hiding this comment.
is this continue necessary? I don't see how it changes loop flow since this is the end of the for and will naturally continue through the end of the range of subsystems.
There was a problem hiding this comment.
It's not strictly necessary, no. I added it to because from my interpretation of the structure of the code here, it's sort of like a switch case and only one branch should be valid, so if there was ever a third case (very unlikely since cgroup v1 is now legacy code), this would be correct for that situation.
Looking at it a bit more, I think it would be better if those two if blocks were restructured to better represent that intent that only one interface is expected to be matched. What do you think of something more like this, would that be more readable?
switch s.(type) {
case deleter:
...
case pather:
...
}There was a problem hiding this comment.
continue looks okay to me to be honest. switch could work, but it couldn't have if len(procs) > 0 { case which is another continue case.
There was a problem hiding this comment.
Confirmed with @estesp offline that this is not a blocker.
The kernel will block an attempt to rmdir a cgroup path that still has running processes in it. Since the removal code uses the standard
os.RemoveAllfunction, the Go runtime will helpfully fall back to anrm -rf-like removal algorithm if the cheapunlink/rmdirattempts fail.Since cgroup files cannot be removed, the caller ends up with an unhelpful "unlinkat /sys/fs/cgroup/.../cgroup.events: operation not permitted" error message that doesn't actually give any actionable information because the original error was swallowed by the runtime.
This changes the
Deletefunctions to detect cgroups with still running processes and return an error indicating that removal cannot be done.Signed-off-by: Josh Seba [email protected]