When we are about to start a unit, check the deps again by poettering · Pull Request #4733 · systemd/systemd

poettering · 2016-11-24T19:22:07Z

Handle #4725 a bit better, by making a job for a unit that has a bindsto= but no after= fail ultimately when the bound to unit isn't up when we are about to start it.

evverx · 2016-11-24T22:23:08Z

src/core/unit.c

+         * taken care of this already, but let's check this here again. After all, the dependencies might have changed
+         * due to a reload or because the user forgot to place ordering dependencies in conjunction with BindsTo=. */
+        if (!unit_verify_deps(u))
+                return -ENOLINK;


I'm reading the docs

BindsTo=
Configures requirement dependencies, very similar in style to Requires=, however in addition to this behavior, it also declares that this unit is stopped when any of the units listed
suddenly disappears. Units can suddenly, unexpectedly disappear if a service terminates on its own choice, a device is unplugged or a mount point unmounted without involvement of systemd.

It's not so obvious why BindsTo= requires After=

Also, this commit changes the semantic of the BindsTo=:

-bash-4.3# systemctl cat a --no-pager # /etc/systemd/system/a.service [Service] Type=oneshot ExecStart=/bin/sh -x -c 'echo Hey' RemainAfterExit=true -bash-4.3# systemctl cat b --no-pager # /etc/systemd/system/b.service [Unit] BindsTo=a.service [Service] ExecStart=/bin/sh -x -c 'echo B' Type=oneshot RemainAfterExit=true

v229 works

this PR:

-bash-4.3# systemctl start b A dependency job for b.service failed. See 'journalctl -xe' for details. -bash-4.3# journalctl -u a -u b --no-pager -- Logs begin at Thu 2016-11-24 22:05:00 UTC, end at Thu 2016-11-24 22:12:25 UTC. -- Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Trying to enqueue job b.service/start/replace Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Installed new job a.service/start as 244 Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Installed new job b.service/start as 194 Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Enqueued job b.service/start as 194 Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Job b.service/start finished, result=dependency Nov 24 22:06:48 systemd-testsuite systemd[1]: Dependency failed for b.service. Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Job b.service/start failed with result 'dependency'. Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Passing 0 fds to service Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: About to execute: /bin/sh -x -c 'echo Hey' Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Forked /bin/sh as 227 Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Changed dead -> start Nov 24 22:06:48 systemd-testsuite systemd[227]: a.service: Executing: /bin/sh -x -c 'echo Hey' Nov 24 22:06:48 systemd-testsuite systemd[1]: Starting a.service... Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Collecting. Nov 24 22:06:48 systemd-testsuite sh[227]: + echo Hey Nov 24 22:06:48 systemd-testsuite sh[227]: Hey Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Child 227 belongs to a.service Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Main process exited, code=exited, status=0/SUCCESS Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Changed start -> exited Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Job a.service/start finished, result=done Nov 24 22:06:48 systemd-testsuite systemd[1]: Started a.service. Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: cgroup is empty Nov 24 22:09:35 systemd-testsuite systemd[1]: b.service: Collecting. Nov 24 22:09:35 systemd-testsuite systemd[1]: b.service: Collecting.

And I'm not sure this is the right thing to do

I'm reading #4725 (comment)
Why not

b) add in After= if BindsTo= is used without ordering

?

I reread the docs. And I don't really understand why BindsTo= requires After=

Indeed in this context it's not clear at all. You mostly want to say "stop b if a goes away", but don't care about the ordering, and with a normal Requires= it is not a requirement to add After= as well. It still makes sense in the context of .device units (although this is mostly for an internal/technical reason, in the sense that systemd needs to "import" udev devices), but not here indeed.

martinpitt · 2016-11-25T11:10:15Z

I killed the autopkgtests as they cause trouble: Shutdown hangs with

[[0;32m  OK  [0m] Stopped target Encrypted Volumes.
[[0;32m  OK  [0m] Stopped Forward Password Requests to Wall Directory Watch.
[[0;32m  OK  [0m] Stopped Dispatch Password Requests to Console Directory Watch.
[[0;32m  OK  [0m] Stopped target Swap.
[[0;32m  OK  [0m] Stopped Load/Save Random Seed.
[[0;32m  OK  [0m] Stopped Update UTMP about System Boot/Shutdown.
[[0;32m  OK  [0m] Stopped Network Time Synchronization.
[[0;32m  OK  [0m] Stopped Create Volatile Files and Directories.
[[0;32m  OK  [0m] Stopped target Local File Systems.
         Unmounting /run/lock...
         Unmounting /run/user/1000...
[[0;32m  OK  [0m] Unmounted /run/lock.
[[0;32m  OK  [0m] Unmounted /run/user/1000.
[[0;32m  OK  [0m] Reached target Unmount All Filesystems.
[[0;32m  OK  [0m] Stopped target Local File Systems (Pre).
         Stopping Monitoring of LVM2 mirrors?ng dmeventd or progress polling...
[[0;32m  OK  [0m] Stopped Remount Root and Kernel File Systems.
[[0;32m  OK  [0m] Stopped Create Static Device Nodes in /dev.
[[0;1;33mDEPEND[0m] Dependency failed for Shutdown.

and the testbed never actually reboots, thus it times out waiting for ssh. Looks like the "default" test on Fedora suffers from the same, although the output is a bit hard to read.

poettering · 2016-11-25T16:05:44Z

@martinpitt hmm, that failure is really interesting... Any chance we can get debug logging turned on for such tests? That would be very helpful for tracking down the issue here...

poettering · 2016-11-25T16:15:49Z

It's not so obvious why BindsTo= requires After=

It doesn't really actually.

BindsTo= just means: whatever happens, don't permit A running if it has BindsTo=B and B isn't running. The man page doesn't say that overly clearly, but it's kinda what already was in place, except in a racy fashion: we'd not detect this at all the right times, but only if a service dies, but not when it is in the process of being started. That was a mistake, and is highly racy: we'd check for bindsto on service exit, but not on service start, so if you combined multiple bindsto on a single service and then have some of them exit and some of them start, then we'd notice that the start wasn't complete yet, just becaus eof the eixt, if you follow at all what I am saying here...

Now, BindsTo= doesn't "require" After=, but it makes a lot of sense to use them together. If you don't then no ordering is applied, and that means if you have A BindsTo= B, and start both, then A might be started before B is ready which would then fail... If you add in the order however it's clear that A shouldn't be started anyway before B is up, hence all is good.

poettering · 2016-11-25T16:20:21Z

i'll extend the docs a bit, to make clear what "bindsto" really means: i.e. that a Unit is actually really "Bound" to another one, and will never be up without it...

evverx · 2016-11-25T16:26:08Z

@poettering , well, I see the point. But

a) warn if BindsTo= is used without ordering
b) add in After= if BindsTo= is used without ordering
c) if a unit is upped that bindsto another unit, fail it immediately is the requirements aren't met, with a proper error message in the logs.

Why not b) ?
Why should we break all services with BindsTo= and without After=?

poettering · 2016-11-25T16:30:46Z

Why not c) ?
Why should we break all services with BindsTo= and without After=?

Well, they are already broken, it's just harder to trigger. There's code in place that shuts down a unit if any of its BindsTo= deps are not up. This code got called under a number of conditions before, but not in all of them, we the device GC case we call it at one more occasion, so it suddenly detects some corner case quicker. Now, the code could also get invoked due to other reasons, hence there was always a race: things were safe only as long as this piece of code never got invoked, but that's a special case really...

Essentially, the GC patch really just made the race more visible, it was always there, and it always was due to user misconfig...

evverx · 2016-11-25T16:46:39Z

Essentially, the GC patch really just made the race more visible, it was always there, and it always was due to user misconfig...

Right. But [email protected] worked (most of the time) :-)

Anyway, I think you're right.

i'll extend the docs a bit, to make clear what "bindsto" really means

Did you mean the man page? Or the NEWS too?

poettering · 2016-11-25T16:47:44Z

Did you mean the man page? Or the NEWS too?

I meant the man page, but I figure we can mention this in the NEWS file, too

arvidjaar · 2016-11-25T17:05:59Z

@poettering

whatever happens, don't permit A running if it has BindsTo=B and B isn't running

Do you mean that if A Requires B it is permissible to start A even if B is not up or cannot be brought up? I am afraid I am rather lost now in actual semantic of both Requires and BindsTo.

@evverx

add in After= if BindsTo= is used without ordering

I think that absolutely makes sense. If both is expected behavior - BindsTo triggers start of depending unit (being variant of Requires) and unit cannot even be started without dependent unit being started, then lone BindsTo with After is simply useless.

evverx · 2016-11-25T17:25:05Z

@arvidjaar ,

I think that absolutely makes sense

I just quote

I am afraid I am rather lost now in actual semantic of both Requires and BindsTo

:-)

@poettering

hmm, that failure is really interesting..

I added some logging

index 854da96..a3100fe 100644
--- a/src/core/unit.c
+++ b/src/core/unit.c
@@ -1540,13 +1540,17 @@ static bool unit_verify_deps(Unit *u) {
          * the job processing, but do not have any effect afterwards. */

         SET_FOREACH(other, u->dependencies[UNIT_BINDS_TO], j)
-                if (!UNIT_IS_ACTIVE_OR_RELOADING(unit_active_state(other)))
+                if (!UNIT_IS_ACTIVE_OR_RELOADING(unit_active_state(other))) {
+                        log_unit_warning(u, "BindsTo=%s is not in active state", other->id);
                         return false;
+                }

         for (i = 0; i < ELEMENTSOF(negative); i++)
                 SET_FOREACH(other, u->dependencies[negative[i]], j)
-                        if (!UNIT_IS_INACTIVE_OR_FAILED(unit_active_state(other)))
+                        if (!UNIT_IS_INACTIVE_OR_FAILED(unit_active_state(other))) {
+                                log_unit_warning(u, "Conflicts or ConflictedBy %s is not in active state", other->id);
                                 return false;
+                        }

         return true;
 }

Logs (updated):

$ sudo journalctl -D /var/tmp/systemd-test.FjzuB8/journal/222ff1a8f8004b15a3a87e2e653e887e/ | grep -E '(swap|shutdown).target'
Nov 25 06:30:08 systemd-testsuite systemd[1]: swap.target: Installed new job swap.target/stop as 208
Nov 25 06:30:08 systemd-testsuite systemd[1]: shutdown.target: Installed new job shutdown.target/start as 142
Nov 25 06:30:09 systemd-testsuite systemd[1]: swap.target: Merged into installed job swap.target/stop as 208
Nov 25 06:30:09 systemd-testsuite systemd[1]: shutdown.target: Merged into installed job shutdown.target/start as 142
Nov 25 06:32:10 systemd-testsuite systemd[1]: shutdown.target: Installed new job shutdown.target/start as 142
Nov 25 06:32:10 systemd-testsuite systemd[1]: swap.target: Installed new job swap.target/stop as 154
Nov 25 06:32:14 systemd-testsuite systemd[1]: shutdown.target: Conflicts or ConflictedBy swap.target is not in active state
Nov 25 06:32:14 systemd-testsuite systemd[1]: shutdown.target: Job shutdown.target/start finished, result=dependency
Nov 25 06:32:14 systemd-testsuite systemd[1]: shutdown.target: Job shutdown.target/start failed with result 'dependency'.
Nov 25 06:32:15 systemd-testsuite systemd[1]: swap.target changed active -> dead
Nov 25 06:32:15 systemd-testsuite systemd[1]: swap.target: Job swap.target/stop finished, result=done

poettering · 2016-11-29T19:38:01Z

@poettering
whatever happens, don't permit A running if it has BindsTo=B and B isn't running
Do you mean that if A Requires B it is permissible to start A even if B is not up or cannot be brought up? I am afraid I am rather lost now in actual semantic of both Requires and BindsTo.

A unit A with Requires=B doesn't actually care if B is up or not. All that matters is that the start job for it succeeded. But that can succeed even without B actually being up if some ConditionXYZ= failed for the service. After all conditoins are considered "non-fatal". They permit a unit's job to succeed cleanly even if the condition doesn't hold, but of course the unit won't be up afterwards.

poettering · 2016-11-29T20:09:22Z

@evverx hmm, interesting! apparently we are not having an After= dep between swap.target and shutdown.target, but really should. The patch just made this visible. Will fix.

evverx · 2016-11-29T21:20:21Z

src/core/unit.c

+         * the job processing, but do not have any effect afterwards. */
+
+        SET_FOREACH(other, u->dependencies[UNIT_BINDS_TO], j)
+                if (!UNIT_IS_ACTIVE_OR_RELOADING(unit_active_state(other)))


I think we should log something here. Something like #4733 (comment)
Note: I logged an incorrect message for !UNIT_IS_INACTIVE_OR_FAILED(unit_active_state(other)) case

makes sense. will add

…are also ordered against it Let's tweak the automatic dependency generation of target units: not only add a Conflicts= towards shutdown.target but also an After= line for it, so that we can be sure the new target is not started when the old target is still up. Discovered in the context of systemd#4733 (Also, exclude dependency generation if for shutdown.target itself. — This is strictly speaking redundant, as unit_add_two_dependencies_by_name() detects that and becomes a NOP, but let's make this explicit for readability.)

poettering · 2016-11-29T22:31:45Z

I force pushed a new version now, that adds more docs, adds the suggested log message, and fixes the umount issues. Please have a look!

evverx · 2016-11-29T23:23:15Z

Execution failed! See logfile for details: Timeout waiting for ping

CentOS failed to reboot

evverx · 2016-11-30T00:43:28Z

Nov 30 00:19:17 systemd[1]: reboot.target: Trying to enqueue job reboot.target/start/replace-irreversibly
Nov 30 00:19:17 systemd[1]: systemd-update-utmp.service: Looking at job systemd-update-utmp.service/verify-active conflicted_by=no
Nov 30 00:19:17 systemd[1]: systemd-update-utmp.service: Looking at job systemd-update-utmp.service/stop conflicted_by=no
Nov 30 00:19:17 systemd[1]: systemd-update-utmp.service: Fixing conflicting jobs systemd-update-utmp.service/verify-active,systemd-update-utmp.service/stop by deleting job systemd-update-utmp.service/stop
Nov 30 00:19:17 systemd[1]: systemd-update-utmp-runlevel.service: Looking at job systemd-update-utmp-runlevel.service/start conflicted_by=no
Nov 30 00:19:17 systemd[1]: systemd-update-utmp-runlevel.service: Looking at job systemd-update-utmp-runlevel.service/stop conflicted_by=no
Nov 30 00:19:17 systemd[1]: systemd-update-utmp-runlevel.service: Fixing conflicting jobs systemd-update-utmp-runlevel.service/start,systemd-update-utmp-runlevel.service/stop by deleting job systemd-update-utmp-runlevel.service/stop
Nov 30 00:19:17 systemd[1]: systemd-reboot.service: Fixing conflicting jobs systemd-reboot.service/stop,systemd-reboot.service/start by deleting job systemd-reboot.service/stop
Nov 30 00:19:17 systemd[1]: shutdown.target: Deleting job shutdown.target/stop as dependency of job systemd-reboot.service/stop
Nov 30 00:19:17 systemd[1]: systemd-update-utmp-runlevel.service: Deleting job systemd-update-utmp-runlevel.service/start as dependency of job shutdown.target/stop
Nov 30 00:19:17 systemd[1]: dracut-shutdown.service: Installed new job dracut-shutdown.service/start as 631
Nov 30 00:19:17 systemd[1]: systemd-reboot.service: Installed new job systemd-reboot.service/start as 627
Nov 30 00:19:17 systemd[1]: reboot.target: Installed new job reboot.target/start as 626
Nov 30 00:19:17 systemd[1]: plymouth-reboot.service: Installed new job plymouth-reboot.service/start as 779
Nov 30 00:19:17 systemd[1]: shutdown.target: Installed new job shutdown.target/start as 630
Nov 30 00:19:17 systemd[1]: reboot.target: Enqueued job reboot.target/start as 626
Nov 30 00:19:17 systemd[2017]: Reached target Shutdown.
...
Nov 30 00:19:19 systemd[1]: shutdown.target: Conflicting with unit systemd-update-utmp.service, but unit isn't deactivated.
Nov 30 00:19:19 systemd[1]: shutdown.target: Job shutdown.target/start finished, result=dependency
Nov 30 00:19:19 systemd[1]: Dependency failed for Shutdown.
Nov 30 00:19:19 systemd[1]: systemd-reboot.service: Job systemd-reboot.service/start finished, result=dependency
Nov 30 00:19:19 systemd[1]: Dependency failed for Reboot.
Nov 30 00:19:19 systemd[1]: reboot.target: Job reboot.target/start finished, result=dependency
Nov 30 00:19:19 systemd[1]: Dependency failed for Reboot.
Nov 30 00:19:19 systemd[1]: reboot.target: Job reboot.target/start failed with result 'dependency'.
Nov 30 00:19:19 systemd[1]: systemd-reboot.service: Job systemd-reboot.service/start failed with result 'dependency'.
Nov 30 00:19:19 systemd[1]: shutdown.target: Job shutdown.target/start failed with result 'dependency'.
Nov 30 00:19:19 systemd[1]: reboot.target: Collecting.

evverx · 2016-11-30T00:48:23Z

BTW: systemd-centos-ci: Manually running the tests

…are also ordered against it Let's tweak the automatic dependency generation of target units: not only add a Conflicts= towards shutdown.target but also an After= line for it, so that we can be sure the new target is not started when the old target is still up. Discovered in the context of systemd#4733 (Also, exclude dependency generation if for shutdown.target itself. — This is strictly speaking redundant, as unit_add_two_dependencies_by_name() detects that and becomes a NOP, but let's make this explicit for readability.)

poettering · 2016-12-02T17:39:50Z

OK, I force pushed a new version, that hopefully fixes the issue at hand. Let's see if the CI agrees.

thx1111 · 2017-01-14T14:28:54Z

I read (most of) your mail on the mailing list... but it was really long. Too long ;). Sorry.

Still, thanks for taking a look.

for example the binding unit would be pulled in by the same event which creates the bound-to unit, there'd be race between them both being started, and sometimes the start would succeed, and sometimes not.

To understand the mechanism of your race, I have to understand the mechanism of your BindsTo= dependency, and I do not. I have been considering different possible mechanisms, in the form of specific examples. This is my third attempt, writing and re-writing this post.

It is confusing for me, and paramount to keep in mind, what thing is the subject and object of "After". After what? "After" could refer to 1) queuing a unit for start, or to 2) "successfully" executing the service section of a unit. In practice, it seems that "After" is referring to the "successful" execution of the service section, rather than merely the enqueuing of a unit for start.

If there is no After= dependency between the two units, then both units would have the exact same one start dependency - the event. Both units would be enqueued for start simultaneously. There is no race there.

I might then imagine that your race must operate as a result of some event which occurs after these two units are enqueued for start, and be related to your definition of the BindsTo= dependency.

If BindsTo= is seen as an "edge triggered" reverse stop mechanism, then there is no context for a start race. So then, I will infer that your BindsTo= provides also a forward start mechanism, where, supposing "Unit B Binds To Unit A", then this means that Unit B will be enqueued for start when Unit A has successfully executed its service section.

But that would be of no matter in your example, which assumes that both units have a start dependency on the "same event". The Unit B start dependency will have already been satisfied by that event, and the "activation" of Unit A will cause no change to the activation state of Unit B.

I am still at a loss, trying to hypothesize the mechanism for a start race. Could you please provide a more detailed and precise description of your race?

keszybz · 2017-01-14T22:53:17Z

I am still at a loss, trying to hypothesize the mechanism for a start race. Could you please provide a more detailed and precise description of your race?

The archetypical example is when a device detected by udev has a SYSTEMD_WANTS tag, and the service unit it pulls in has BindTo= on the device. The .device unit is not created immediately, and (at least I think so), the .service unit can be created before the .device unit appears in systemd.

thx1111 · 2017-01-17T18:21:05Z

Thanks Zbyszek. That sounds like my use case - plug in a network device, recognized by udev, and have systemd generate a master device, using a .service file, where the network device is then dependent upon that master device. I have run this on half a dozen machines - from laptops to desktops, 32 bit and 64 bit - but I have never seen anything that looks like a "race", in the sense that the results are different, from event to event, or from machine to machine. Of course, it may be that the "race" always comes-out the same, but that would be an unnecessary and over-complicating assumption.

Why did you think there was a race condition, originally? Was that based upon your knowledge of the code? It may be that perusing the code is required here, and I have only skimmed through it, trying to understand the structure.

If there really is a race here, then that's a problem, and it should be identified. But what evidence are we working with, for now?

keszybz · 2017-01-17T18:33:20Z

See #4725, in particular #4725 (comment).

thx1111 · 2017-01-20T20:17:48Z

Ok, thanks. Looking at that comment:

... The service is bound to the device but not ordered against it. This means startup of the service will not be delayed, as there's nothing to wait for. But as soon as it is up we notice that the device isn't around yet, and are stopped again. This is because at the time the udev rule is run the .device is not yet visible to PID 1. The fix for that is easy: just add After= in additoin to BindsTo= on the device.

That obviously is the wrong interpretation for BindsTo= and the wrong "fix". BindsTo= is an "edge sensitive" sort of dependency, according to the documentation. The behavior of BindsTo= is perhaps being confused with the behavior of Requisite= here. Even then, Requisite= itself would not cause this kind of start-then-stop behavior, because the condition - a requirement that the device already exist - would not have been satisfied in the first place. If BindsTo= is really producing that kind of start-then-stop state sequence, then the systemd logic needs to be corrected.

So, I do not see this as an example of, or evidence of, a race condition. What has been described is simply a logical sequence of states, even though it is a faulty sequence, since the BindsTo= logic has, apparently, been "broken" by the commit at c5a97ed.

poettering · 2017-02-09T16:26:28Z

I force pushed a new, rebased version now. it's much more conservative than the previous ones, and will only recheck the dependencies that have BindsTo= in combination with After=. This also permitted me to drop the cryptsetup change part, as its use of BindsTo= is no longer problematic.

This means, as a net result:

BindsTo= alone (or in combination with Before=): the main effect is that if the dependent unit suddenly goes away, then we will go away too. Also, it will act like Requires=: when the start job for a bindsto unit fails our start job will fail.
BindsTo= in combination with After=: behaviour like in 1. However, on top of that: we'll recheck right before starting that the dependent unit is actually really running.

Note that a start job for a unit where a ConditionXYZ= doesn't hold will not fail, but succeed. This means, if you dependent on such a unit with BindsTo= without After=, then you will still be started and stay running. However, if you depend on it with BindsTo= and with After=, then you will fail to start.

keszybz

I like this version a lot. The new rules are simple enough and make sense.

The tense in documentation and comments is really important: the whole raison d'etre for this PR is dependencies whose status changes over time. I made two suggestions which I think make the statements more correct.

keszybz · 2017-02-14T04:10:22Z

NEWS


+        * When units are about to be started an additional check is now done to
+          ensure that all dependencies of type BindsTo= (when used in
+          combination with After=) are started.


are started → have been started

keszybz · 2017-02-14T04:12:46Z

src/core/unit.c

+                        continue;
+
+                if (!UNIT_IS_ACTIVE_OR_RELOADING(unit_active_state(other))) {
+                        log_unit_notice(u, "Bound to unit %s, but unit isn't activated.", other->id);


activated → active

To explain a bit more: for me, "activated" seems to encompass both being "active", i.e. running at this point in time, and "having been activated", i.e. having a start job enqueued or running. Thus, I prefer "active", for which I think there can be confusion like that.

keszybz · 2017-02-14T04:15:59Z

src/core/unit.c


+        /* Let's make sure that the deps really are in order before we start this. Normally the job engine should have
+         * taken care of this already, but let's check this here again. After all, the dependencies might have changed
+         * due to a reload or because the user forgot to place ordering dependencies in conjunction with BindsTo=. */


Hm, isn't this comment out of date now? We don't require ordering dependencies, so saying that the user "forgot" is misleading. Also, if there are no ordering dependencies, this does not change the requirement dependencies, so that last sentence is wrong ("the dependencies might have changed ... because the user forgot to place ordering dependencies").

keszybz · 2017-02-14T04:22:24Z

man/systemd.unit.xml

+        instead of <varname>Requires=</varname> in order to achieve a system that is more robust when dealing with
+        failing services.</para>
+
+        <para>Note that this dependency type does not imply that the other unit always has to be in activated state


active state?

keszybz · 2017-02-14T04:23:17Z

man/systemd.unit.xml

+        when this unit is running. Specifically: failing condition checks (such as
+        <varname>ConditionPathExists=</varname>, <varname>ConditionPathExists=</varname>, … — see below) do not cause
+        the start job of a unit with a <varname>Requires=</varname> dependency on it to fail. Also, some unit types may
+        deactivate on their own (for example, a service process may decide to exit cleanly, or a device is unplugged by


is unplugged → may be unplugged (to match "may decativate")

keszybz · 2017-02-14T04:23:49Z

man/systemd.unit.xml

+        the start job of a unit with a <varname>Requires=</varname> dependency on it to fail. Also, some unit types may
+        deactivate on their own (for example, a service process may decide to exit cleanly, or a device is unplugged by
+        the user), which is not propagated to units having a <varname>Requires=</varname> dependency. Use the
+        <varname>BindsTo=</varname> dependency type to ensure that a unit may never be in activated state without a


BindsTo= together with After=?

active?

keszybz · 2017-02-14T04:26:00Z

man/systemd.unit.xml

+        <listitem><para>Configures requirement dependencies, very similar in style to
+        <varname>Requires=</varname>. However, this dependency type is stronger: in addition to the effect of
+        <varname>Requires=</varname> it declares that if the unit bound to is stopped, this unit will be stopped
+        too. This means a unit bound to another unit that suddenly enters inactive state will be promptly stopped too.


promptly? Is this different than normal process of a stop job being enqueued? It think this word suggest something special happens here, but that's not true, the unit might take a very long time to stop.

keszybz · 2017-02-14T04:28:02Z

man/systemd.unit.xml

+        running.</para>
+
+        <para>Note that <varname>BindsTo=</varname> does not imply an ordering dependency, but in many cases is best
+        combined with <varname>After=</varname>, see above.</para></listitem>


It'd merge the last two paragraphs. That last sentence could just be "Hence, in many cases it is best to combine BindsTo= with After=.".

…ch start operation of a unit Let's make sure we verify that all BindsTo= are in order before we actually go and dispatch a start operation to a unit. Normally the job queue should already have made sure all deps are in order, but this might not have been sufficient in two cases: a) when the user changes deps during runtime and reloads the daemon, and b) when the user placed BindsTo= dependencies without matching After= dependencies, so that we don't actually wait for the bound to unit to be up before upping also the binding unit. See: systemd#4725

Let's emphasize that both really should be combined with After=.

poettering · 2017-02-14T12:39:30Z

I force pushed a new version now, with all @keszybz points fixed! Thanks for the review!

thx1111 · 2017-02-14T15:35:04Z

Is the new version the git Master branch now, if I want to test this?

poettering · 2017-02-14T16:06:34Z

@thx1111 no, it isn't. This PR is still open. It gets merged only after this PR is merged into master, which hasn't happened yet.

thx1111 · 2017-02-27T22:12:58Z

systemd-git 232.r1130.g293b16734-1
libsystemd-git 232.r1130.g293b16734-1

Here, the physical devices enp3s0 and wlp2s0 exist, and the usb devices, wlp0s26u1u1, wlp0s26u1u2, and enp0s26u1u6, are not installed. The virtual network devices, bond0 and bridge0, must be created.

The unit file file for the "enslaved" devices does not have "After=", and has, literally:

BindsTo= sys-subsystem-net-devices-%i.device
BindsTo= sys-subsystem-net-devices-%p.device

Here, adding Requisite=sys-subsystem-net-devices-%p.device makes no difference.

With this patched version of systemd, while the network devices are, eventually, created properly, systemd still runs the service unit files for the nonexistent devices, and so, the patch completely and utterly fails to address or resolve the original problem. These unit files for the nonexistent devices still run and still result in "Unit entered failed state."

In the log, notice that jobs for both existing and non-existing devices are "removed", and then all of these same jobs are started, nonetheless. From the number of log entries, compared to the normal log entries before the patch, there seems to be a lot of "do nothing" work being done in the job scheduling.

On a systemctl restart ..., the log is less chatty, without the Unnecessary job ... was removed messages.

...
Feb 27 13:21:28 lapis systemd[1]: Started bond0 Interface Master.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Unit is bound to inactive unit sys-subsystem-net-devices-bond0.device. Stopping, too
Feb 27 13:21:28 lapis systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Failed to enqueue stop job, ignoring: Transaction is destructive.
Feb 27 13:21:28 lapis systemd[1]: Started bridge0 Interface Master.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Unit is bound to inactive unit sys-subsystem-net-devices-bridge0.device. Stopping,
Feb 27 13:21:28 lapis systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Failed to enqueue stop job, ignoring: Transaction is destructive.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-wlp0s26u1u1.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-enp0s26u1u6.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-bond0.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-wlp0s26u1u2.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-wlp0s26u1u2.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-wlp0s26u1u1.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-enp0s26u1u6.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved bond0@bridge0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved wlp0s26u1u1@bond0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved wlp2s0@bond0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved enp0s26u1u6@bridge0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved enp3s0@bond0...
Feb 27 13:21:28 lapis systemd[1]: Starting wpa_supplicant on wlp2s0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved wlp0s26u1u2@bond0...
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Feb 27 13:21:28 lapis systemd[1]: Failed to start Enslaved wlp0s26u1u1@bond0.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Unit entered failed state.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Failed with result 'exit-code'.
...
Feb 27 13:21:29 lapis systemd[1]: Started Enslaved bond0@bridge0.
Feb 27 13:21:29 lapis systemd[1]: Started Enslaved wlp2s0@bond0.
Feb 27 13:21:29 lapis systemd[1]: Started Enslaved enp3s0@bond0.
Feb 27 13:21:29 lapis ip[394]: Cannot find device "enp0s26u1u6"
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Feb 27 13:21:29 lapis systemd[1]: Failed to start Enslaved enp0s26u1u6@bridge0.
Feb 27 13:21:29 lapis ip[386]: Cannot find device "wlp0s26u1u2"
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Unit entered failed state.
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Failed with result 'exit-code'.
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Feb 27 13:21:29 lapis systemd[1]: Failed to start Enslaved wlp0s26u1u2@bond0.
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Unit entered failed state.
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Failed with result 'exit-code'.
...

And then, was there any useful change in functionality actually provided by commit cd8e857 ?

thx1111 · 2017-03-12T02:34:12Z

Do you want to close-out this pull request, and also close-out the bug reports,

"Requisite" dependency fails with "device" units - and with "After" ordering, produces "crazy" errors
#4756

"BindsTo=" must confirm that a device exists before enqueuing a start job on that device.
#4413

and open a new bug report for systemd 233? Or do we continue in this pull request? Or do we continue in one of the other bug reports?

The reason the reload failing is that some magma network devices do not fully come up (e.g. do not get an ip even if set to auto). This triggers this systemd/vagrant issue: systemd/systemd#4733 fgrehm/vagrant-lxc#481 Signed-off-by: Christian Krämer <[email protected]>

… would fail. The reason the reload failing is that some magma network devices do not fully come up (e.g. do not get an ip even if set to auto). This triggers this systemd/vagrant issue: systemd/systemd#4733 fgrehm/vagrant-lxc#481 Signed-off-by: Christian Krämer <[email protected]>

… would fail. (#13622) The reason the reload failing is that some magma network devices do not fully come up (e.g. do not get an ip even if set to auto). This triggers this systemd/vagrant issue: systemd/systemd#4733 fgrehm/vagrant-lxc#481 Signed-off-by: Christian Krämer <[email protected]>

… would fail. The reason the reload failing is that some magma network devices do not fully come up (e.g. do not get an ip even if set to auto). This triggers this systemd/vagrant issue: systemd/systemd#4733 fgrehm/vagrant-lxc#481 Signed-off-by: Christian Krämer <[email protected]>

poettering added the pid1 label Nov 24, 2016

poettering mentioned this pull request Nov 24, 2016

Sometimes sys-subsystem-net-devices-ens2.device is wrongly GCed #4725

Closed

evverx reviewed Nov 24, 2016

View reviewed changes

martinpitt added the needs-discussion 🤔 label Nov 25, 2016

martinpitt added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 25, 2016

evverx reviewed Nov 29, 2016

View reviewed changes

poettering force-pushed the binds-to branch from f500c12 to e7f8963 Compare November 29, 2016 22:30

poettering removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 29, 2016

evverx added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 29, 2016

poettering force-pushed the binds-to branch from e7f8963 to c883fe7 Compare December 2, 2016 17:39

poettering mentioned this pull request Dec 6, 2016

systemd umount /dev/urandom on shutdown, leading to asserts e.g. in lxc #4810

Closed

keszybz added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jan 16, 2017

poettering added this to the v233 milestone Feb 8, 2017

poettering force-pushed the binds-to branch from 16649a8 to 011ba71 Compare February 9, 2017 16:21

poettering force-pushed the binds-to branch from 011ba71 to 9a45f70 Compare February 9, 2017 16:44

poettering removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Feb 9, 2017

keszybz requested changes Feb 14, 2017

View reviewed changes

keszybz reviewed Feb 14, 2017

View reviewed changes

poettering added 2 commits February 14, 2017 13:38

man: extend the docs on BindsTo= and Requires= a bit

62d3ca2

Let's emphasize that both really should be combined with After=.

poettering force-pushed the binds-to branch from 9a45f70 to 62d3ca2 Compare February 14, 2017 12:39

keszybz approved these changes Feb 15, 2017

View reviewed changes

keszybz merged commit cd8e857 into systemd:master Feb 15, 2017

crasu mentioned this pull request Aug 16, 2022

fix(ci): Moved setting up the hostname to ansible as reloading the vm would fail. magma/magma#13622

Merged

mpfirrmann mentioned this pull request Sep 27, 2022

Assorted packer and dev-vm cleanups magma/magma#13969

Closed

3 tasks

prahal mentioned this pull request Jun 21, 2023

ifupdown : bookworm - bridge interfaces are down at boot debops/debops#2370

Open

Uh oh!

Conversation

poettering commented Nov 24, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martinpitt commented Nov 25, 2016

Uh oh!

poettering commented Nov 25, 2016

Uh oh!

poettering commented Nov 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poettering commented Nov 25, 2016

Uh oh!

evverx commented Nov 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poettering commented Nov 25, 2016

Uh oh!

evverx commented Nov 25, 2016

Uh oh!

poettering commented Nov 25, 2016

Uh oh!

arvidjaar commented Nov 25, 2016

Uh oh!

evverx commented Nov 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poettering commented Nov 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poettering commented Nov 29, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

poettering commented Nov 29, 2016

Uh oh!

evverx commented Nov 29, 2016

Uh oh!

evverx commented Nov 30, 2016

Uh oh!

evverx commented Nov 30, 2016

Uh oh!

poettering commented Dec 2, 2016

Uh oh!

thx1111 commented Jan 14, 2017

Uh oh!

keszybz commented Jan 14, 2017

Uh oh!

thx1111 commented Jan 17, 2017

Uh oh!

keszybz commented Jan 17, 2017

Uh oh!

thx1111 commented Jan 20, 2017

Uh oh!

poettering commented Feb 9, 2017

Uh oh!

keszybz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

poettering commented Nov 25, 2016 •

edited

Loading

evverx commented Nov 25, 2016 •

edited

Loading

evverx commented Nov 25, 2016 •

edited

Loading

poettering commented Nov 29, 2016 •

edited

Loading

thx1111 commented Feb 27, 2017 •

edited

Loading