Skip to content

Conversation

@AdrianVovk
Copy link
Contributor

@AdrianVovk AdrianVovk commented Jan 21, 2024

Fixes #30640
Fixes #15850
Fixes #15849

@AdrianVovk AdrianVovk force-pushed the slice-freeze-thaw branch 6 times, most recently from 15dc5ab to 2293d81 Compare January 25, 2024 01:11
@github-actions github-actions bot added the tests label Jan 25, 2024
@AdrianVovk
Copy link
Contributor Author

AdrianVovk commented Jan 25, 2024

This also introduces freeze/thaw for mount units via FIFREEZE and FITHAW, but it's really just a proof of concept to show what becomes possible w/ the rework. Unsure if this is something you'd want 🤷

If you do want to keep it, we'd want to change mount_can_freeze to prevent you from freezing the mount that contains /usr? Or just -.mount? Or just let the user shoot themselves in the foot (they can still sysrq their way out of it)

@AdrianVovk AdrianVovk force-pushed the slice-freeze-thaw branch 2 times, most recently from 1214c60 to bb377b9 Compare January 25, 2024 13:15
@AdrianVovk
Copy link
Contributor Author

OK, I think this is ready for a look

CC @msekletar, @Werkov, and @poettering

Also, CC @msizanoen1. I've effectively reverted a14137d. Please take a look

@AdrianVovk AdrianVovk marked this pull request as ready for review January 25, 2024 14:44
@github-actions github-actions bot added the please-review PR is ready for (re-)review by a maintainer label Jan 25, 2024
@poettering
Copy link
Member

/cc @msekletar

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good

src/core/mount.c Outdated
FREEZER_THAW, FREEZER_PARENT_THAW));

unit_next_freezer_state(u, action, &next, &target);
next = freezer_state_finish(next); /* We're completely sync */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ioctl is indeed sync, like all disk accesses. we could theoroetically do this async, i.e. fork off a thread or so. but maybe it's not worth the effort for now, given that disk accesses are pretty much always sync, dunno.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, filesystem freezing should be instant; I don't think it recurses down the tree like a cgroup freeze does. But of course the implementation is filesystem dependent, so 🤷

I think it makes most sense to leave it sync for now, and then if someone discovers a filesystem that does actually recursively do things in response to a freeze request then we make it async

CC @brauner

src/core/mount.c Outdated
} else {
r = RET_NERRNO(ioctl(fd, FITHAW, 0));
if (r == -EINVAL) /* Not frozen by us */
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging is sometimes a bit wonky in pid 1, and needs some clean-ups, but i think we should do log_unit_warning() here for unexpected freeze/thaw errors at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did log_unit_error instead, since we don't ignore the exit value and do report failure to our caller

@poettering
Copy link
Member

so, the last commit, what's your thinking there precisely, how to you intend to make use of this?

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels Jan 25, 2024
@AdrianVovk
Copy link
Contributor Author

last commit, what's your thinking there precisely

Part of the motivation for the rework is to clean up the code handling freezing/thawing, which has the effect of making the behavior sane enough that other kinds of units can be frozen/thawed (i.e. not via cgroup) should the use-case arise. The toy example I had in my head for this was a mount unit frozen/thawed via FIFREEZE/FITHAW. It was simple to throw together and so I did as a proof of concept.

how to you intend to make use of this

I don't have a use-case for it. I'm perfectly happy to drop the commit

@poettering
Copy link
Member

I don't have a use-case for it. I'm perfectly happy to drop the commit

so, i am torn. I kinda like it, but note that mount units can also be assigned a slice (and by default are assigned system.slice). We do that because of userspace mount tools, i.e. fuse mounts. That's why mounts also have a cgroup and everything: the processes backing those mounts might run continously. Now if a slice is frozen that has mounts in it, is it right that this also freezes the file systems?

i have no clear answer to this, I am a bit unsure.

I mean, on one hand: if a mount is indeed a fuse mount or something else that forks off a bg process, then freezing this process would typically mean that the fs is effectively frozen too, much like what your patch does for block based file systems...

so i think, if we want to be fully correct we'd freeze the cgroup and the block device for a .mount unit... but this is also problematic, since we ourselves might end up access the path so we might deadlock ourselves?

I mean, how does that work anyway? if the open() we do to issue the thaw ioctl accesses the fs and the fs is frozen, how can we be sure it won't deadlock? should we keep the fd open when we freeze so that we do not have to reopen, but can just fire one more ioctl just like that?

@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Jan 29, 2024
@AdrianVovk
Copy link
Contributor Author

FIFREEZE doesn't prevent reading from the filesystem; it only prevents writing it. It's not like a dm-crypt suspend where reads are also suspended forever. So the path should always be accessible enough to open() it

Fuse does complicate matters a little, because the process that's serving the open() request will itself be frozen. But I think that's manageable: we should just thaw the cgroup before we try to thaw the FS via ioctl. When we freeze we'd do the opposite: freeze via ioctl before freezing the cgroup.

But also: there are no FUSE filesystems on the list we support freezing with, so this may be a non-issue. At least until some FUSE filesystem is added to the list

@brauner
Copy link
Contributor

brauner commented Jan 30, 2024

I mean, how does that work anyway? if the open() we do to issue the thaw ioctl accesses the fs and the fs is frozen, how can we be sure it won't deadlock? should we keep the fd open when we freeze so that we do not have to reopen, but can just fire one more ioctl just like that?

Of course, the VFS layer is designed in a manner that prevents you from deadlocking just by issuing an FIFREEZE or FITHAW ioctl using an file descriptor.

@poettering
Copy link
Member

Of course, the VFS layer is designed in a manner that prevents you from deadlocking just by issuing an FIFREEZE or FITHAW ioctl using an file descriptor.

Right, but that requires us to keep a fd open, right? it would not be safe to start again with open() and a path to a frozen fs, right?

@poettering
Copy link
Member

FIFREEZE doesn't prevent reading from the filesystem; it only prevents writing it. It's not like a dm-crypt suspend where reads are also suspended forever. So the path should always be accessible enough to open() it

Hmm, I thought that this is is now propagate downwards and FIFREEZE actually does also result in a block layer suspend if the mechanism exists there. @brauner did i misunderstand that?

@poettering
Copy link
Member

So I think we shouldn't do the FIFREEZE thing. Because that freezes whole superblocks, not mount points. Thus if you have a single superblock and mount it to 25 places (maybe some of the subdirs too). And then we do "systemctl freeze" on it, then it should not result in all of them to be frozen. But that's what actually would happen here. Hence, I think we should drop that part.

@poettering
Copy link
Member

Looks good to merge if you drop the last commit.

@poettering poettering added good-to-merge/with-minor-suggestions and removed please-review PR is ready for (re-)review by a maintainer labels Jan 30, 2024
This commit overhauls the way freeze/thaw works recursively:

First, it introduces new FreezerActions that are like the existing
FREEZE and THAW but indicate that the action was initiated by a parent
unit. We also refactored the code to pass these FreezerActions through
the whole call stack so that we can make use of them. FreezerState was
extended similarly, to be able to differentiate between a unit that's
frozen manually and a unit that's frozen because a parent is frozen.

Next, slices were changed to check recursively that all their child
units can be frozen before it attempts to freeze them. This is different
from the previous behavior, that would just check if the unit's type
supported freezing at all. This cleans up the code, and also ensures
that the behavior of slices corresponds to the unit's actual ability
to be frozen

Next, we make it so that if you FREEZE a slice, it'll PARENT_FREEZE
all of its children. Similarly, if you THAW a slice it will PARENT_THAW
its children.

Finally, we use the new states available to us to refactor the code
that actually does the cgroup freezing. The code now looks at the unit's
existing freezer state and the action being requested, and decides what
next state is most appropriate. Then it puts the unit in that state.
For instance, a RUNNING unit with a request to PARENT_FREEZE will
put the unit into the PARENT_FREEZING state. As another example, a
FROZEN unit who's parent is also FROZEN will transition to
PARENT_FROZEN in response to a request to THAW.

Fixes systemd#30640
Fixes systemd#15850
Previously, unit_{start,stop,reload} would call the low-level cgroup
unfreeze function whenever a unit was started, stopped, or reloaded. It
did so with no error checking. This call would ultimately recurse up the
cgroup tree, and unfreeze all the parent cgroups of the unit, unless an
error occurred (in which case I have no idea what would happen...)

After the freeze/thaw rework in a previous commit, this can no longer
work. If we recursively thaw the parent cgroups of the unit, there may
be sibling units marked as PARENT_FROZEN which will no longer actually
have frozen parents. Fixing this is a lot more complicated than simply
disallowing start/stop/reload on a frozen unit

Fixes systemd#15849
@AdrianVovk
Copy link
Contributor Author

OK I dropped d09f2c5

I had also broken the tests by renaming the state, so I fixed that also (at least I think. We'll have to see what CI says)

@poettering poettering added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/with-minor-suggestions labels Jan 30, 2024
@AdrianVovk
Copy link
Contributor Author

CI failures appear unrelated

The various jammy-* tests all fail in TEST-75-RESOLVED

testing-farm:fedora-rawhide-x86_64 is failing due to packaging conflicts

@poettering poettering merged commit 116ce3f into systemd:main Jan 31, 2024
@github-actions github-actions bot removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Jan 31, 2024
@AdrianVovk AdrianVovk deleted the slice-freeze-thaw branch January 31, 2024 18:05
@AdrianVovk
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Incorrect behavior of nested Freeze/Thaw in a Slice Recursive behavior of unit freezing Freezing units vs jobs

4 participants