-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Rework slice recursive freeze/thaw #31039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
15dc5ab to
2293d81
Compare
|
This also introduces freeze/thaw for mount units via FIFREEZE and FITHAW, but it's really just a proof of concept to show what becomes possible w/ the rework. Unsure if this is something you'd want 🤷 If you do want to keep it, we'd want to change |
1214c60 to
bb377b9
Compare
|
OK, I think this is ready for a look CC @msekletar, @Werkov, and @poettering Also, CC @msizanoen1. I've effectively reverted a14137d. Please take a look |
|
/cc @msekletar |
poettering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good
src/core/mount.c
Outdated
| FREEZER_THAW, FREEZER_PARENT_THAW)); | ||
|
|
||
| unit_next_freezer_state(u, action, &next, &target); | ||
| next = freezer_state_finish(next); /* We're completely sync */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ioctl is indeed sync, like all disk accesses. we could theoroetically do this async, i.e. fork off a thread or so. but maybe it's not worth the effort for now, given that disk accesses are pretty much always sync, dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, filesystem freezing should be instant; I don't think it recurses down the tree like a cgroup freeze does. But of course the implementation is filesystem dependent, so 🤷
I think it makes most sense to leave it sync for now, and then if someone discovers a filesystem that does actually recursively do things in response to a freeze request then we make it async
CC @brauner
src/core/mount.c
Outdated
| } else { | ||
| r = RET_NERRNO(ioctl(fd, FITHAW, 0)); | ||
| if (r == -EINVAL) /* Not frozen by us */ | ||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging is sometimes a bit wonky in pid 1, and needs some clean-ups, but i think we should do log_unit_warning() here for unexpected freeze/thaw errors at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did log_unit_error instead, since we don't ignore the exit value and do report failure to our caller
|
so, the last commit, what's your thinking there precisely, how to you intend to make use of this? |
Part of the motivation for the rework is to clean up the code handling freezing/thawing, which has the effect of making the behavior sane enough that other kinds of units can be frozen/thawed (i.e. not via cgroup) should the use-case arise. The toy example I had in my head for this was a mount unit frozen/thawed via FIFREEZE/FITHAW. It was simple to throw together and so I did as a proof of concept.
I don't have a use-case for it. I'm perfectly happy to drop the commit |
so, i am torn. I kinda like it, but note that mount units can also be assigned a slice (and by default are assigned system.slice). We do that because of userspace mount tools, i.e. fuse mounts. That's why mounts also have a cgroup and everything: the processes backing those mounts might run continously. Now if a slice is frozen that has mounts in it, is it right that this also freezes the file systems? i have no clear answer to this, I am a bit unsure. I mean, on one hand: if a mount is indeed a fuse mount or something else that forks off a bg process, then freezing this process would typically mean that the fs is effectively frozen too, much like what your patch does for block based file systems... so i think, if we want to be fully correct we'd freeze the cgroup and the block device for a .mount unit... but this is also problematic, since we ourselves might end up access the path so we might deadlock ourselves? I mean, how does that work anyway? if the open() we do to issue the thaw ioctl accesses the fs and the fs is frozen, how can we be sure it won't deadlock? should we keep the fd open when we freeze so that we do not have to reopen, but can just fire one more ioctl just like that? |
bb377b9 to
4f22aa3
Compare
|
Fuse does complicate matters a little, because the process that's serving the But also: there are no FUSE filesystems on the list we support freezing with, so this may be a non-issue. At least until some FUSE filesystem is added to the list |
4f22aa3 to
d09f2c5
Compare
Of course, the VFS layer is designed in a manner that prevents you from deadlocking just by issuing an FIFREEZE or FITHAW ioctl using an file descriptor. |
Right, but that requires us to keep a fd open, right? it would not be safe to start again with open() and a path to a frozen fs, right? |
Hmm, I thought that this is is now propagate downwards and |
|
So I think we shouldn't do the FIFREEZE thing. Because that freezes whole superblocks, not mount points. Thus if you have a single superblock and mount it to 25 places (maybe some of the subdirs too). And then we do "systemctl freeze" on it, then it should not result in all of them to be frozen. But that's what actually would happen here. Hence, I think we should drop that part. |
|
Looks good to merge if you drop the last commit. |
This commit overhauls the way freeze/thaw works recursively: First, it introduces new FreezerActions that are like the existing FREEZE and THAW but indicate that the action was initiated by a parent unit. We also refactored the code to pass these FreezerActions through the whole call stack so that we can make use of them. FreezerState was extended similarly, to be able to differentiate between a unit that's frozen manually and a unit that's frozen because a parent is frozen. Next, slices were changed to check recursively that all their child units can be frozen before it attempts to freeze them. This is different from the previous behavior, that would just check if the unit's type supported freezing at all. This cleans up the code, and also ensures that the behavior of slices corresponds to the unit's actual ability to be frozen Next, we make it so that if you FREEZE a slice, it'll PARENT_FREEZE all of its children. Similarly, if you THAW a slice it will PARENT_THAW its children. Finally, we use the new states available to us to refactor the code that actually does the cgroup freezing. The code now looks at the unit's existing freezer state and the action being requested, and decides what next state is most appropriate. Then it puts the unit in that state. For instance, a RUNNING unit with a request to PARENT_FREEZE will put the unit into the PARENT_FREEZING state. As another example, a FROZEN unit who's parent is also FROZEN will transition to PARENT_FROZEN in response to a request to THAW. Fixes systemd#30640 Fixes systemd#15850
Previously, unit_{start,stop,reload} would call the low-level cgroup
unfreeze function whenever a unit was started, stopped, or reloaded. It
did so with no error checking. This call would ultimately recurse up the
cgroup tree, and unfreeze all the parent cgroups of the unit, unless an
error occurred (in which case I have no idea what would happen...)
After the freeze/thaw rework in a previous commit, this can no longer
work. If we recursively thaw the parent cgroups of the unit, there may
be sibling units marked as PARENT_FROZEN which will no longer actually
have frozen parents. Fixing this is a lot more complicated than simply
disallowing start/stop/reload on a frozen unit
Fixes systemd#15849
d09f2c5 to
4cb2e6a
Compare
|
OK I dropped d09f2c5 I had also broken the tests by renaming the state, so I fixed that also (at least I think. We'll have to see what CI says) |
|
CI failures appear unrelated The various
|
|
Thanks! |
Fixes #30640
Fixes #15850
Fixes #15849