Rework slice recursive freeze/thaw #31039

AdrianVovk · 2024-01-21T22:22:13Z

AdrianVovk · 2024-01-25T01:12:40Z

This also introduces freeze/thaw for mount units via FIFREEZE and FITHAW, but it's really just a proof of concept to show what becomes possible w/ the rework. Unsure if this is something you'd want 🤷

If you do want to keep it, we'd want to change mount_can_freeze to prevent you from freezing the mount that contains /usr? Or just -.mount? Or just let the user shoot themselves in the foot (they can still sysrq their way out of it)

AdrianVovk · 2024-01-25T14:44:04Z

OK, I think this is ready for a look

CC @msekletar, @Werkov, and @poettering

Also, CC @msizanoen1. I've effectively reverted a14137d. Please take a look

poettering · 2024-01-25T16:23:26Z

/cc @msekletar

poettering

looks pretty good

src/basic/unit-def.c

src/core/cgroup.c

src/core/dbus-unit.c

src/core/job.h

src/core/mount.c

poettering · 2024-01-25T17:41:48Z

src/core/mount.c

+                              FREEZER_THAW, FREEZER_PARENT_THAW));
+
+        unit_next_freezer_state(u, action, &next, &target);
+        next = freezer_state_finish(next); /* We're completely sync */


the ioctl is indeed sync, like all disk accesses. we could theoroetically do this async, i.e. fork off a thread or so. but maybe it's not worth the effort for now, given that disk accesses are pretty much always sync, dunno.

As far as I can tell, filesystem freezing should be instant; I don't think it recurses down the tree like a cgroup freeze does. But of course the implementation is filesystem dependent, so 🤷

I think it makes most sense to leave it sync for now, and then if someone discovers a filesystem that does actually recursively do things in response to a freeze request then we make it async

CC @brauner

poettering · 2024-01-25T17:45:02Z

src/core/mount.c

+        } else {
+                r = RET_NERRNO(ioctl(fd, FITHAW, 0));
+                if (r == -EINVAL) /* Not frozen by us */
+                        return 0;


logging is sometimes a bit wonky in pid 1, and needs some clean-ups, but i think we should do log_unit_warning() here for unexpected freeze/thaw errors at least.

I did log_unit_error instead, since we don't ignore the exit value and do report failure to our caller

poettering · 2024-01-25T17:45:57Z

so, the last commit, what's your thinking there precisely, how to you intend to make use of this?

AdrianVovk · 2024-01-25T18:28:31Z

last commit, what's your thinking there precisely

Part of the motivation for the rework is to clean up the code handling freezing/thawing, which has the effect of making the behavior sane enough that other kinds of units can be frozen/thawed (i.e. not via cgroup) should the use-case arise. The toy example I had in my head for this was a mount unit frozen/thawed via FIFREEZE/FITHAW. It was simple to throw together and so I did as a proof of concept.

how to you intend to make use of this

I don't have a use-case for it. I'm perfectly happy to drop the commit

src/core/job.c

src/basic/mountpoint-util.c

src/core/mount.c

poettering · 2024-01-25T21:22:29Z

I don't have a use-case for it. I'm perfectly happy to drop the commit

so, i am torn. I kinda like it, but note that mount units can also be assigned a slice (and by default are assigned system.slice). We do that because of userspace mount tools, i.e. fuse mounts. That's why mounts also have a cgroup and everything: the processes backing those mounts might run continously. Now if a slice is frozen that has mounts in it, is it right that this also freezes the file systems?

i have no clear answer to this, I am a bit unsure.

I mean, on one hand: if a mount is indeed a fuse mount or something else that forks off a bg process, then freezing this process would typically mean that the fs is effectively frozen too, much like what your patch does for block based file systems...

so i think, if we want to be fully correct we'd freeze the cgroup and the block device for a .mount unit... but this is also problematic, since we ourselves might end up access the path so we might deadlock ourselves?

I mean, how does that work anyway? if the open() we do to issue the thaw ioctl accesses the fs and the fs is frozen, how can we be sure it won't deadlock? should we keep the fd open when we freeze so that we do not have to reopen, but can just fire one more ioctl just like that?

AdrianVovk · 2024-01-29T21:30:40Z

FIFREEZE doesn't prevent reading from the filesystem; it only prevents writing it. It's not like a dm-crypt suspend where reads are also suspended forever. So the path should always be accessible enough to open() it

Fuse does complicate matters a little, because the process that's serving the open() request will itself be frozen. But I think that's manageable: we should just thaw the cgroup before we try to thaw the FS via ioctl. When we freeze we'd do the opposite: freeze via ioctl before freezing the cgroup.

But also: there are no FUSE filesystems on the list we support freezing with, so this may be a non-issue. At least until some FUSE filesystem is added to the list

brauner · 2024-01-30T09:25:38Z

I mean, how does that work anyway? if the open() we do to issue the thaw ioctl accesses the fs and the fs is frozen, how can we be sure it won't deadlock? should we keep the fd open when we freeze so that we do not have to reopen, but can just fire one more ioctl just like that?

Of course, the VFS layer is designed in a manner that prevents you from deadlocking just by issuing an FIFREEZE or FITHAW ioctl using an file descriptor.

poettering · 2024-01-30T14:13:46Z

Of course, the VFS layer is designed in a manner that prevents you from deadlocking just by issuing an FIFREEZE or FITHAW ioctl using an file descriptor.

Right, but that requires us to keep a fd open, right? it would not be safe to start again with open() and a path to a frozen fs, right?

poettering · 2024-01-30T14:17:36Z

FIFREEZE doesn't prevent reading from the filesystem; it only prevents writing it. It's not like a dm-crypt suspend where reads are also suspended forever. So the path should always be accessible enough to open() it

Hmm, I thought that this is is now propagate downwards and FIFREEZE actually does also result in a block layer suspend if the mechanism exists there. @brauner did i misunderstand that?

poettering · 2024-01-30T14:21:57Z

So I think we shouldn't do the FIFREEZE thing. Because that freezes whole superblocks, not mount points. Thus if you have a single superblock and mount it to 25 places (maybe some of the subdirs too). And then we do "systemctl freeze" on it, then it should not result in all of them to be frozen. But that's what actually would happen here. Hence, I think we should drop that part.

poettering · 2024-01-30T14:23:42Z

Looks good to merge if you drop the last commit.

This commit overhauls the way freeze/thaw works recursively: First, it introduces new FreezerActions that are like the existing FREEZE and THAW but indicate that the action was initiated by a parent unit. We also refactored the code to pass these FreezerActions through the whole call stack so that we can make use of them. FreezerState was extended similarly, to be able to differentiate between a unit that's frozen manually and a unit that's frozen because a parent is frozen. Next, slices were changed to check recursively that all their child units can be frozen before it attempts to freeze them. This is different from the previous behavior, that would just check if the unit's type supported freezing at all. This cleans up the code, and also ensures that the behavior of slices corresponds to the unit's actual ability to be frozen Next, we make it so that if you FREEZE a slice, it'll PARENT_FREEZE all of its children. Similarly, if you THAW a slice it will PARENT_THAW its children. Finally, we use the new states available to us to refactor the code that actually does the cgroup freezing. The code now looks at the unit's existing freezer state and the action being requested, and decides what next state is most appropriate. Then it puts the unit in that state. For instance, a RUNNING unit with a request to PARENT_FREEZE will put the unit into the PARENT_FREEZING state. As another example, a FROZEN unit who's parent is also FROZEN will transition to PARENT_FROZEN in response to a request to THAW. Fixes systemd#30640 Fixes systemd#15850

Previously, unit_{start,stop,reload} would call the low-level cgroup unfreeze function whenever a unit was started, stopped, or reloaded. It did so with no error checking. This call would ultimately recurse up the cgroup tree, and unfreeze all the parent cgroups of the unit, unless an error occurred (in which case I have no idea what would happen...) After the freeze/thaw rework in a previous commit, this can no longer work. If we recursively thaw the parent cgroups of the unit, there may be sibling units marked as PARENT_FROZEN which will no longer actually have frozen parents. Fixing this is a lot more complicated than simply disallowing start/stop/reload on a frozen unit Fixes systemd#15849

AdrianVovk · 2024-01-30T16:18:30Z

OK I dropped d09f2c5

I had also broken the tests by renaming the state, so I fixed that also (at least I think. We'll have to see what CI says)

AdrianVovk · 2024-01-30T18:42:28Z

CI failures appear unrelated

The various jammy-* tests all fail in TEST-75-RESOLVED

testing-farm:fedora-rawhide-x86_64 is failing due to packaging conflicts

AdrianVovk · 2024-01-31T18:05:22Z

Thanks!

github-actions bot added the util-lib label Jan 21, 2024

AdrianVovk force-pushed the slice-freeze-thaw branch 6 times, most recently from 15dc5ab to 2293d81 Compare January 25, 2024 01:11

github-actions bot added the tests label Jan 25, 2024

AdrianVovk force-pushed the slice-freeze-thaw branch 2 times, most recently from 1214c60 to bb377b9 Compare January 25, 2024 13:15

AdrianVovk marked this pull request as ready for review January 25, 2024 14:44

github-actions bot added the please-review PR is ready for (re-)review by a maintainer label Jan 25, 2024

poettering requested changes Jan 25, 2024

View reviewed changes

poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels Jan 25, 2024

YHNdnzj requested changes Jan 25, 2024

View reviewed changes

src/core/job.c Outdated Show resolved Hide resolved

src/core/job.c Outdated Show resolved Hide resolved

src/core/job.c Outdated Show resolved Hide resolved

src/core/job.c Show resolved Hide resolved

YHNdnzj reviewed Jan 25, 2024

View reviewed changes

src/basic/mountpoint-util.c Outdated Show resolved Hide resolved

YHNdnzj reviewed Jan 25, 2024

View reviewed changes

src/core/mount.c Outdated Show resolved Hide resolved

src/core/mount.c Outdated Show resolved Hide resolved

AdrianVovk force-pushed the slice-freeze-thaw branch from bb377b9 to 4f22aa3 Compare January 29, 2024 21:24

github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Jan 29, 2024

AdrianVovk force-pushed the slice-freeze-thaw branch from 4f22aa3 to d09f2c5 Compare January 29, 2024 21:48

poettering added good-to-merge/with-minor-suggestions and removed please-review PR is ready for (re-)review by a maintainer labels Jan 30, 2024

AdrianVovk added 2 commits January 30, 2024 11:18

AdrianVovk force-pushed the slice-freeze-thaw branch from d09f2c5 to 4cb2e6a Compare January 30, 2024 16:18

poettering added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/with-minor-suggestions labels Jan 30, 2024

poettering merged commit 116ce3f into systemd:main Jan 31, 2024

github-actions bot removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Jan 31, 2024

AdrianVovk deleted the slice-freeze-thaw branch January 31, 2024 18:05

Uh oh!

Rework slice recursive freeze/thaw #31039

Rework slice recursive freeze/thaw #31039

Uh oh!

Conversation

AdrianVovk commented Jan 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdrianVovk commented Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdrianVovk commented Jan 25, 2024

Uh oh!

poettering commented Jan 25, 2024

Uh oh!

poettering left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poettering Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

AdrianVovk Jan 29, 2024

Choose a reason for hiding this comment

Uh oh!

poettering Jan 25, 2024

Choose a reason for hiding this comment

Uh oh!

AdrianVovk Jan 29, 2024

Choose a reason for hiding this comment

Uh oh!

poettering commented Jan 25, 2024

Uh oh!

AdrianVovk commented Jan 25, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poettering commented Jan 25, 2024

Uh oh!

AdrianVovk commented Jan 29, 2024

Uh oh!

brauner commented Jan 30, 2024

Uh oh!

poettering commented Jan 30, 2024

Uh oh!

poettering commented Jan 30, 2024

Uh oh!

poettering commented Jan 30, 2024

Uh oh!

poettering commented Jan 30, 2024

Uh oh!

AdrianVovk commented Jan 30, 2024

Uh oh!

AdrianVovk commented Jan 30, 2024

Uh oh!

AdrianVovk commented Jan 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

AdrianVovk commented Jan 21, 2024 •

edited

Loading

AdrianVovk commented Jan 25, 2024 •

edited

Loading