Skip to content

When we are about to start a unit, check the deps again#4733

Merged
keszybz merged 2 commits intosystemd:masterfrom
poettering:binds-to
Feb 15, 2017
Merged

When we are about to start a unit, check the deps again#4733
keszybz merged 2 commits intosystemd:masterfrom
poettering:binds-to

Conversation

@poettering
Copy link
Member

Handle #4725 a bit better, by making a job for a unit that has a bindsto= but no after= fail ultimately when the bound to unit isn't up when we are about to start it.

* taken care of this already, but let's check this here again. After all, the dependencies might have changed
* due to a reload or because the user forgot to place ordering dependencies in conjunction with BindsTo=. */
if (!unit_verify_deps(u))
return -ENOLINK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading the docs

BindsTo=
Configures requirement dependencies, very similar in style to Requires=, however in addition to this behavior, it also declares that this unit is stopped when any of the units listed
suddenly disappears. Units can suddenly, unexpectedly disappear if a service terminates on its own choice, a device is unplugged or a mount point unmounted without involvement of systemd.

It's not so obvious why BindsTo= requires After=

Also, this commit changes the semantic of the BindsTo=:

-bash-4.3# systemctl cat a --no-pager
# /etc/systemd/system/a.service
[Service]
Type=oneshot
ExecStart=/bin/sh -x -c 'echo Hey'
RemainAfterExit=true

-bash-4.3# systemctl cat b --no-pager
# /etc/systemd/system/b.service
[Unit]
BindsTo=a.service

[Service]
ExecStart=/bin/sh -x -c 'echo B'
Type=oneshot
RemainAfterExit=true

v229 works

this PR:

-bash-4.3# systemctl start b
A dependency job for b.service failed. See 'journalctl -xe' for details.

-bash-4.3# journalctl -u a -u b --no-pager
-- Logs begin at Thu 2016-11-24 22:05:00 UTC, end at Thu 2016-11-24 22:12:25 UTC. --
Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Trying to enqueue job b.service/start/replace
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Installed new job a.service/start as 244
Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Installed new job b.service/start as 194
Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Enqueued job b.service/start as 194
Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Job b.service/start finished, result=dependency
Nov 24 22:06:48 systemd-testsuite systemd[1]: Dependency failed for b.service.
Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Job b.service/start failed with result 'dependency'.
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Passing 0 fds to service
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: About to execute: /bin/sh -x -c 'echo Hey'
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Forked /bin/sh as 227
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Changed dead -> start
Nov 24 22:06:48 systemd-testsuite systemd[227]: a.service: Executing: /bin/sh -x -c 'echo Hey'
Nov 24 22:06:48 systemd-testsuite systemd[1]: Starting a.service...
Nov 24 22:06:48 systemd-testsuite systemd[1]: b.service: Collecting.
Nov 24 22:06:48 systemd-testsuite sh[227]: + echo Hey
Nov 24 22:06:48 systemd-testsuite sh[227]: Hey
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Child 227 belongs to a.service
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Main process exited, code=exited, status=0/SUCCESS
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Changed start -> exited
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: Job a.service/start finished, result=done
Nov 24 22:06:48 systemd-testsuite systemd[1]: Started a.service.
Nov 24 22:06:48 systemd-testsuite systemd[1]: a.service: cgroup is empty
Nov 24 22:09:35 systemd-testsuite systemd[1]: b.service: Collecting.
Nov 24 22:09:35 systemd-testsuite systemd[1]: b.service: Collecting.

And I'm not sure this is the right thing to do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading #4725 (comment)
Why not

b) add in After= if BindsTo= is used without ordering

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reread the docs. And I don't really understand why BindsTo= requires After=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed in this context it's not clear at all. You mostly want to say "stop b if a goes away", but don't care about the ordering, and with a normal Requires= it is not a requirement to add After= as well. It still makes sense in the context of .device units (although this is mostly for an internal/technical reason, in the sense that systemd needs to "import" udev devices), but not here indeed.

@martinpitt
Copy link
Contributor

I killed the autopkgtests as they cause trouble: Shutdown hangs with

[[0;32m  OK  [0m] Stopped target Encrypted Volumes.
[[0;32m  OK  [0m] Stopped Forward Password Requests to Wall Directory Watch.
[[0;32m  OK  [0m] Stopped Dispatch Password Requests to Console Directory Watch.
[[0;32m  OK  [0m] Stopped target Swap.
[[0;32m  OK  [0m] Stopped Load/Save Random Seed.
[[0;32m  OK  [0m] Stopped Update UTMP about System Boot/Shutdown.
[[0;32m  OK  [0m] Stopped Network Time Synchronization.
[[0;32m  OK  [0m] Stopped Create Volatile Files and Directories.
[[0;32m  OK  [0m] Stopped target Local File Systems.
         Unmounting /run/lock...
         Unmounting /run/user/1000...
[[0;32m  OK  [0m] Unmounted /run/lock.
[[0;32m  OK  [0m] Unmounted /run/user/1000.
[[0;32m  OK  [0m] Reached target Unmount All Filesystems.
[[0;32m  OK  [0m] Stopped target Local File Systems (Pre).
         Stopping Monitoring of LVM2 mirrors?ng dmeventd or progress polling...
[[0;32m  OK  [0m] Stopped Remount Root and Kernel File Systems.
[[0;32m  OK  [0m] Stopped Create Static Device Nodes in /dev.
[[0;1;33mDEPEND[0m] Dependency failed for Shutdown.

and the testbed never actually reboots, thus it times out waiting for ssh. Looks like the "default" test on Fedora suffers from the same, although the output is a bit hard to read.

@martinpitt martinpitt added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 25, 2016
@poettering
Copy link
Member Author

@martinpitt hmm, that failure is really interesting... Any chance we can get debug logging turned on for such tests? That would be very helpful for tracking down the issue here...

@poettering
Copy link
Member Author

poettering commented Nov 25, 2016

It's not so obvious why BindsTo= requires After=

It doesn't really actually.

BindsTo= just means: whatever happens, don't permit A running if it has BindsTo=B and B isn't running. The man page doesn't say that overly clearly, but it's kinda what already was in place, except in a racy fashion: we'd not detect this at all the right times, but only if a service dies, but not when it is in the process of being started. That was a mistake, and is highly racy: we'd check for bindsto on service exit, but not on service start, so if you combined multiple bindsto on a single service and then have some of them exit and some of them start, then we'd notice that the start wasn't complete yet, just becaus eof the eixt, if you follow at all what I am saying here...

Now, BindsTo= doesn't "require" After=, but it makes a lot of sense to use them together. If you don't then no ordering is applied, and that means if you have A BindsTo= B, and start both, then A might be started before B is ready which would then fail... If you add in the order however it's clear that A shouldn't be started anyway before B is up, hence all is good.

@poettering
Copy link
Member Author

i'll extend the docs a bit, to make clear what "bindsto" really means: i.e. that a Unit is actually really "Bound" to another one, and will never be up without it...

@evverx
Copy link
Contributor

evverx commented Nov 25, 2016

@poettering , well, I see the point. But

a) warn if BindsTo= is used without ordering
b) add in After= if BindsTo= is used without ordering
c) if a unit is upped that bindsto another unit, fail it immediately is the requirements aren't met, with a proper error message in the logs.

Why not b) ?
Why should we break all services with BindsTo= and without After=?

@poettering
Copy link
Member Author

Why not c) ?
Why should we break all services with BindsTo= and without After=?

Well, they are already broken, it's just harder to trigger. There's code in place that shuts down a unit if any of its BindsTo= deps are not up. This code got called under a number of conditions before, but not in all of them, we the device GC case we call it at one more occasion, so it suddenly detects some corner case quicker. Now, the code could also get invoked due to other reasons, hence there was always a race: things were safe only as long as this piece of code never got invoked, but that's a special case really...

Essentially, the GC patch really just made the race more visible, it was always there, and it always was due to user misconfig...

@evverx
Copy link
Contributor

evverx commented Nov 25, 2016

Essentially, the GC patch really just made the race more visible, it was always there, and it always was due to user misconfig...

Right. But [email protected] worked (most of the time) :-)

Anyway, I think you're right.

i'll extend the docs a bit, to make clear what "bindsto" really means

Did you mean the man page? Or the NEWS too?

@poettering
Copy link
Member Author

Did you mean the man page? Or the NEWS too?

I meant the man page, but I figure we can mention this in the NEWS file, too

@arvidjaar
Copy link
Contributor

@poettering

whatever happens, don't permit A running if it has BindsTo=B and B isn't running

Do you mean that if A Requires B it is permissible to start A even if B is not up or cannot be brought up? I am afraid I am rather lost now in actual semantic of both Requires and BindsTo.

@evverx

add in After= if BindsTo= is used without ordering

I think that absolutely makes sense. If both is expected behavior - BindsTo triggers start of depending unit (being variant of Requires) and unit cannot even be started without dependent unit being started, then lone BindsTo with After is simply useless.

@evverx
Copy link
Contributor

evverx commented Nov 25, 2016

@arvidjaar ,

I think that absolutely makes sense

I just quote

I am afraid I am rather lost now in actual semantic of both Requires and BindsTo

:-)

@poettering

hmm, that failure is really interesting..

I added some logging

index 854da96..a3100fe 100644
--- a/src/core/unit.c
+++ b/src/core/unit.c
@@ -1540,13 +1540,17 @@ static bool unit_verify_deps(Unit *u) {
          * the job processing, but do not have any effect afterwards. */

         SET_FOREACH(other, u->dependencies[UNIT_BINDS_TO], j)
-                if (!UNIT_IS_ACTIVE_OR_RELOADING(unit_active_state(other)))
+                if (!UNIT_IS_ACTIVE_OR_RELOADING(unit_active_state(other))) {
+                        log_unit_warning(u, "BindsTo=%s is not in active state", other->id);
                         return false;
+                }

         for (i = 0; i < ELEMENTSOF(negative); i++)
                 SET_FOREACH(other, u->dependencies[negative[i]], j)
-                        if (!UNIT_IS_INACTIVE_OR_FAILED(unit_active_state(other)))
+                        if (!UNIT_IS_INACTIVE_OR_FAILED(unit_active_state(other))) {
+                                log_unit_warning(u, "Conflicts or ConflictedBy %s is not in active state", other->id);
                                 return false;
+                        }

         return true;
 }

Logs (updated):

$ sudo journalctl -D /var/tmp/systemd-test.FjzuB8/journal/222ff1a8f8004b15a3a87e2e653e887e/ | grep -E '(swap|shutdown).target'
Nov 25 06:30:08 systemd-testsuite systemd[1]: swap.target: Installed new job swap.target/stop as 208
Nov 25 06:30:08 systemd-testsuite systemd[1]: shutdown.target: Installed new job shutdown.target/start as 142
Nov 25 06:30:09 systemd-testsuite systemd[1]: swap.target: Merged into installed job swap.target/stop as 208
Nov 25 06:30:09 systemd-testsuite systemd[1]: shutdown.target: Merged into installed job shutdown.target/start as 142
Nov 25 06:32:10 systemd-testsuite systemd[1]: shutdown.target: Installed new job shutdown.target/start as 142
Nov 25 06:32:10 systemd-testsuite systemd[1]: swap.target: Installed new job swap.target/stop as 154
Nov 25 06:32:14 systemd-testsuite systemd[1]: shutdown.target: Conflicts or ConflictedBy swap.target is not in active state
Nov 25 06:32:14 systemd-testsuite systemd[1]: shutdown.target: Job shutdown.target/start finished, result=dependency
Nov 25 06:32:14 systemd-testsuite systemd[1]: shutdown.target: Job shutdown.target/start failed with result 'dependency'.
Nov 25 06:32:15 systemd-testsuite systemd[1]: swap.target changed active -> dead
Nov 25 06:32:15 systemd-testsuite systemd[1]: swap.target: Job swap.target/stop finished, result=done

@poettering
Copy link
Member Author

poettering commented Nov 29, 2016

@poettering

whatever happens, don't permit A running if it has BindsTo=B and B isn't running

Do you mean that if A Requires B it is permissible to start A even if B is not up or cannot be brought up? I am afraid I am rather lost now in actual semantic of both Requires and BindsTo.

A unit A with Requires=B doesn't actually care if B is up or not. All that matters is that the start job for it succeeded. But that can succeed even without B actually being up if some ConditionXYZ= failed for the service. After all conditoins are considered "non-fatal". They permit a unit's job to succeed cleanly even if the condition doesn't hold, but of course the unit won't be up afterwards.

@poettering
Copy link
Member Author

@evverx hmm, interesting! apparently we are not having an After= dep between swap.target and shutdown.target, but really should. The patch just made this visible. Will fix.

src/core/unit.c Outdated
* the job processing, but do not have any effect afterwards. */

SET_FOREACH(other, u->dependencies[UNIT_BINDS_TO], j)
if (!UNIT_IS_ACTIVE_OR_RELOADING(unit_active_state(other)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should log something here. Something like #4733 (comment)
Note: I logged an incorrect message for !UNIT_IS_INACTIVE_OR_FAILED(unit_active_state(other)) case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. will add

poettering added a commit to poettering/systemd that referenced this pull request Nov 29, 2016
…are also ordered against it

Let's tweak the automatic dependency generation of target units: not only add a
Conflicts= towards shutdown.target but also an After= line for it, so that we
can be sure the new target is not started when the old target is still up.

Discovered in the context of systemd#4733

(Also, exclude dependency generation if for shutdown.target itself. — This is
strictly speaking redundant, as unit_add_two_dependencies_by_name() detects
that and becomes a NOP, but let's make this explicit for readability.)
@poettering poettering removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 29, 2016
@poettering
Copy link
Member Author

I force pushed a new version now, that adds more docs, adds the suggested log message, and fixes the umount issues. Please have a look!

@evverx evverx added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 29, 2016
@evverx
Copy link
Contributor

evverx commented Nov 29, 2016

Execution failed! See logfile for details: Timeout waiting for ping

CentOS failed to reboot

@evverx
Copy link
Contributor

evverx commented Nov 30, 2016

Nov 30 00:19:17 systemd[1]: reboot.target: Trying to enqueue job reboot.target/start/replace-irreversibly
Nov 30 00:19:17 systemd[1]: systemd-update-utmp.service: Looking at job systemd-update-utmp.service/verify-active conflicted_by=no
Nov 30 00:19:17 systemd[1]: systemd-update-utmp.service: Looking at job systemd-update-utmp.service/stop conflicted_by=no
Nov 30 00:19:17 systemd[1]: systemd-update-utmp.service: Fixing conflicting jobs systemd-update-utmp.service/verify-active,systemd-update-utmp.service/stop by deleting job systemd-update-utmp.service/stop
Nov 30 00:19:17 systemd[1]: systemd-update-utmp-runlevel.service: Looking at job systemd-update-utmp-runlevel.service/start conflicted_by=no
Nov 30 00:19:17 systemd[1]: systemd-update-utmp-runlevel.service: Looking at job systemd-update-utmp-runlevel.service/stop conflicted_by=no
Nov 30 00:19:17 systemd[1]: systemd-update-utmp-runlevel.service: Fixing conflicting jobs systemd-update-utmp-runlevel.service/start,systemd-update-utmp-runlevel.service/stop by deleting job systemd-update-utmp-runlevel.service/stop
Nov 30 00:19:17 systemd[1]: systemd-reboot.service: Fixing conflicting jobs systemd-reboot.service/stop,systemd-reboot.service/start by deleting job systemd-reboot.service/stop
Nov 30 00:19:17 systemd[1]: shutdown.target: Deleting job shutdown.target/stop as dependency of job systemd-reboot.service/stop
Nov 30 00:19:17 systemd[1]: systemd-update-utmp-runlevel.service: Deleting job systemd-update-utmp-runlevel.service/start as dependency of job shutdown.target/stop
Nov 30 00:19:17 systemd[1]: dracut-shutdown.service: Installed new job dracut-shutdown.service/start as 631
Nov 30 00:19:17 systemd[1]: systemd-reboot.service: Installed new job systemd-reboot.service/start as 627
Nov 30 00:19:17 systemd[1]: reboot.target: Installed new job reboot.target/start as 626
Nov 30 00:19:17 systemd[1]: plymouth-reboot.service: Installed new job plymouth-reboot.service/start as 779
Nov 30 00:19:17 systemd[1]: shutdown.target: Installed new job shutdown.target/start as 630
Nov 30 00:19:17 systemd[1]: reboot.target: Enqueued job reboot.target/start as 626
Nov 30 00:19:17 systemd[2017]: Reached target Shutdown.
...
Nov 30 00:19:19 systemd[1]: shutdown.target: Conflicting with unit systemd-update-utmp.service, but unit isn't deactivated.
Nov 30 00:19:19 systemd[1]: shutdown.target: Job shutdown.target/start finished, result=dependency
Nov 30 00:19:19 systemd[1]: Dependency failed for Shutdown.
Nov 30 00:19:19 systemd[1]: systemd-reboot.service: Job systemd-reboot.service/start finished, result=dependency
Nov 30 00:19:19 systemd[1]: Dependency failed for Reboot.
Nov 30 00:19:19 systemd[1]: reboot.target: Job reboot.target/start finished, result=dependency
Nov 30 00:19:19 systemd[1]: Dependency failed for Reboot.
Nov 30 00:19:19 systemd[1]: reboot.target: Job reboot.target/start failed with result 'dependency'.
Nov 30 00:19:19 systemd[1]: systemd-reboot.service: Job systemd-reboot.service/start failed with result 'dependency'.
Nov 30 00:19:19 systemd[1]: shutdown.target: Job shutdown.target/start failed with result 'dependency'.
Nov 30 00:19:19 systemd[1]: reboot.target: Collecting.

@evverx
Copy link
Contributor

evverx commented Nov 30, 2016

poettering added a commit to poettering/systemd that referenced this pull request Dec 2, 2016
…are also ordered against it

Let's tweak the automatic dependency generation of target units: not only add a
Conflicts= towards shutdown.target but also an After= line for it, so that we
can be sure the new target is not started when the old target is still up.

Discovered in the context of systemd#4733

(Also, exclude dependency generation if for shutdown.target itself. — This is
strictly speaking redundant, as unit_add_two_dependencies_by_name() detects
that and becomes a NOP, but let's make this explicit for readability.)
@poettering
Copy link
Member Author

OK, I force pushed a new version, that hopefully fixes the issue at hand. Let's see if the CI agrees.

@thx1111
Copy link

thx1111 commented Jan 14, 2017

I read (most of) your mail on the mailing list... but it was really long. Too long ;). Sorry.

Still, thanks for taking a look.

for example the binding unit would be pulled in by the same event which creates the bound-to unit, there'd be race between them both being started, and sometimes the start would succeed, and sometimes not.

To understand the mechanism of your race, I have to understand the mechanism of your BindsTo= dependency, and I do not. I have been considering different possible mechanisms, in the form of specific examples. This is my third attempt, writing and re-writing this post.

It is confusing for me, and paramount to keep in mind, what thing is the subject and object of "After". After what? "After" could refer to 1) queuing a unit for start, or to 2) "successfully" executing the service section of a unit. In practice, it seems that "After" is referring to the "successful" execution of the service section, rather than merely the enqueuing of a unit for start.

If there is no After= dependency between the two units, then both units would have the exact same one start dependency - the event. Both units would be enqueued for start simultaneously. There is no race there.

I might then imagine that your race must operate as a result of some event which occurs after these two units are enqueued for start, and be related to your definition of the BindsTo= dependency.

If BindsTo= is seen as an "edge triggered" reverse stop mechanism, then there is no context for a start race. So then, I will infer that your BindsTo= provides also a forward start mechanism, where, supposing "Unit B Binds To Unit A", then this means that Unit B will be enqueued for start when Unit A has successfully executed its service section.

But that would be of no matter in your example, which assumes that both units have a start dependency on the "same event". The Unit B start dependency will have already been satisfied by that event, and the "activation" of Unit A will cause no change to the activation state of Unit B.

I am still at a loss, trying to hypothesize the mechanism for a start race. Could you please provide a more detailed and precise description of your race?

@keszybz
Copy link
Member

keszybz commented Jan 14, 2017

I am still at a loss, trying to hypothesize the mechanism for a start race. Could you please provide a more detailed and precise description of your race?

The archetypical example is when a device detected by udev has a SYSTEMD_WANTS tag, and the service unit it pulls in has BindTo= on the device. The .device unit is not created immediately, and (at least I think so), the .service unit can be created before the .device unit appears in systemd.

@keszybz keszybz added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jan 16, 2017
@thx1111
Copy link

thx1111 commented Jan 17, 2017

Thanks Zbyszek. That sounds like my use case - plug in a network device, recognized by udev, and have systemd generate a master device, using a .service file, where the network device is then dependent upon that master device. I have run this on half a dozen machines - from laptops to desktops, 32 bit and 64 bit - but I have never seen anything that looks like a "race", in the sense that the results are different, from event to event, or from machine to machine. Of course, it may be that the "race" always comes-out the same, but that would be an unnecessary and over-complicating assumption.

Why did you think there was a race condition, originally? Was that based upon your knowledge of the code? It may be that perusing the code is required here, and I have only skimmed through it, trying to understand the structure.

If there really is a race here, then that's a problem, and it should be identified. But what evidence are we working with, for now?

@keszybz
Copy link
Member

keszybz commented Jan 17, 2017

See #4725, in particular #4725 (comment).

@thx1111
Copy link

thx1111 commented Jan 20, 2017

Ok, thanks. Looking at that comment:

... The service is bound to the device but not ordered against it. This means startup of the service will not be delayed, as there's nothing to wait for. But as soon as it is up we notice that the device isn't around yet, and are stopped again. This is because at the time the udev rule is run the .device is not yet visible to PID 1. The fix for that is easy: just add After= in additoin to BindsTo= on the device.

That obviously is the wrong interpretation for BindsTo= and the wrong "fix". BindsTo= is an "edge sensitive" sort of dependency, according to the documentation. The behavior of BindsTo= is perhaps being confused with the behavior of Requisite= here. Even then, Requisite= itself would not cause this kind of start-then-stop behavior, because the condition - a requirement that the device already exist - would not have been satisfied in the first place. If BindsTo= is really producing that kind of start-then-stop state sequence, then the systemd logic needs to be corrected.

So, I do not see this as an example of, or evidence of, a race condition. What has been described is simply a logical sequence of states, even though it is a faulty sequence, since the BindsTo= logic has, apparently, been "broken" by the commit at c5a97ed.

@poettering poettering added this to the v233 milestone Feb 8, 2017
@poettering
Copy link
Member Author

I force pushed a new, rebased version now. it's much more conservative than the previous ones, and will only recheck the dependencies that have BindsTo= in combination with After=. This also permitted me to drop the cryptsetup change part, as its use of BindsTo= is no longer problematic.

This means, as a net result:

  1. BindsTo= alone (or in combination with Before=): the main effect is that if the dependent unit suddenly goes away, then we will go away too. Also, it will act like Requires=: when the start job for a bindsto unit fails our start job will fail.
  2. BindsTo= in combination with After=: behaviour like in 1. However, on top of that: we'll recheck right before starting that the dependent unit is actually really running.

Note that a start job for a unit where a ConditionXYZ= doesn't hold will not fail, but succeed. This means, if you dependent on such a unit with BindsTo= without After=, then you will still be started and stay running. However, if you depend on it with BindsTo= and with After=, then you will fail to start.

@poettering poettering removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Feb 9, 2017
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this version a lot. The new rules are simple enough and make sense.

The tense in documentation and comments is really important: the whole raison d'etre for this PR is dependencies whose status changes over time. I made two suggestions which I think make the statements more correct.

NEWS Outdated

* When units are about to be started an additional check is now done to
ensure that all dependencies of type BindsTo= (when used in
combination with After=) are started.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are started → have been started

src/core/unit.c Outdated
continue;

if (!UNIT_IS_ACTIVE_OR_RELOADING(unit_active_state(other))) {
log_unit_notice(u, "Bound to unit %s, but unit isn't activated.", other->id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activated → active

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain a bit more: for me, "activated" seems to encompass both being "active", i.e. running at this point in time, and "having been activated", i.e. having a start job enqueued or running. Thus, I prefer "active", for which I think there can be confusion like that.

src/core/unit.c Outdated

/* Let's make sure that the deps really are in order before we start this. Normally the job engine should have
* taken care of this already, but let's check this here again. After all, the dependencies might have changed
* due to a reload or because the user forgot to place ordering dependencies in conjunction with BindsTo=. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, isn't this comment out of date now? We don't require ordering dependencies, so saying that the user "forgot" is misleading. Also, if there are no ordering dependencies, this does not change the requirement dependencies, so that last sentence is wrong ("the dependencies might have changed ... because the user forgot to place ordering dependencies").

instead of <varname>Requires=</varname> in order to achieve a system that is more robust when dealing with
failing services.</para>

<para>Note that this dependency type does not imply that the other unit always has to be in activated state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

active state?

when this unit is running. Specifically: failing condition checks (such as
<varname>ConditionPathExists=</varname>, <varname>ConditionPathExists=</varname>, … — see below) do not cause
the start job of a unit with a <varname>Requires=</varname> dependency on it to fail. Also, some unit types may
deactivate on their own (for example, a service process may decide to exit cleanly, or a device is unplugged by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is unplugged → may be unplugged (to match "may decativate")

the start job of a unit with a <varname>Requires=</varname> dependency on it to fail. Also, some unit types may
deactivate on their own (for example, a service process may decide to exit cleanly, or a device is unplugged by
the user), which is not propagated to units having a <varname>Requires=</varname> dependency. Use the
<varname>BindsTo=</varname> dependency type to ensure that a unit may never be in activated state without a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BindsTo= together with After=?

active?

<listitem><para>Configures requirement dependencies, very similar in style to
<varname>Requires=</varname>. However, this dependency type is stronger: in addition to the effect of
<varname>Requires=</varname> it declares that if the unit bound to is stopped, this unit will be stopped
too. This means a unit bound to another unit that suddenly enters inactive state will be promptly stopped too.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promptly? Is this different than normal process of a stop job being enqueued? It think this word suggest something special happens here, but that's not true, the unit might take a very long time to stop.

running.</para>

<para>Note that <varname>BindsTo=</varname> does not imply an ordering dependency, but in many cases is best
combined with <varname>After=</varname>, see above.</para></listitem>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd merge the last two paragraphs. That last sentence could just be "Hence, in many cases it is best to combine BindsTo= with After=.".

…ch start operation of a unit

Let's make sure we verify that all BindsTo= are in order before we actually go
and dispatch a start operation to a unit. Normally the job queue should already
have made sure all deps are in order, but this might not have been sufficient
in two cases: a) when the user changes deps during runtime and reloads the
daemon, and b) when the user placed BindsTo= dependencies without matching
After= dependencies, so that we don't actually wait for the bound to unit to be
up before upping also the binding unit.

See: systemd#4725
Let's emphasize that both really should be combined with After=.
@poettering
Copy link
Member Author

I force pushed a new version now, with all @keszybz points fixed! Thanks for the review!

@thx1111
Copy link

thx1111 commented Feb 14, 2017

Is the new version the git Master branch now, if I want to test this?

@poettering
Copy link
Member Author

@thx1111 no, it isn't. This PR is still open. It gets merged only after this PR is merged into master, which hasn't happened yet.

@keszybz keszybz merged commit cd8e857 into systemd:master Feb 15, 2017
@thx1111
Copy link

thx1111 commented Feb 27, 2017

systemd-git 232.r1130.g293b16734-1
libsystemd-git 232.r1130.g293b16734-1

Here, the physical devices enp3s0 and wlp2s0 exist, and the usb devices, wlp0s26u1u1, wlp0s26u1u2, and enp0s26u1u6, are not installed. The virtual network devices, bond0 and bridge0, must be created.

The unit file file for the "enslaved" devices does not have "After=", and has, literally:

BindsTo= sys-subsystem-net-devices-%i.device
BindsTo= sys-subsystem-net-devices-%p.device

Here, adding Requisite=sys-subsystem-net-devices-%p.device makes no difference.

With this patched version of systemd, while the network devices are, eventually, created properly, systemd still runs the service unit files for the nonexistent devices, and so, the patch completely and utterly fails to address or resolve the original problem. These unit files for the nonexistent devices still run and still result in "Unit entered failed state."

In the log, notice that jobs for both existing and non-existing devices are "removed", and then all of these same jobs are started, nonetheless. From the number of log entries, compared to the normal log entries before the patch, there seems to be a lot of "do nothing" work being done in the job scheduling.

On a systemctl restart ..., the log is less chatty, without the Unnecessary job ... was removed messages.

...
Feb 27 13:21:28 lapis systemd[1]: Started bond0 Interface Master.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Unit is bound to inactive unit sys-subsystem-net-devices-bond0.device. Stopping, too
Feb 27 13:21:28 lapis systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Failed to enqueue stop job, ignoring: Transaction is destructive.
Feb 27 13:21:28 lapis systemd[1]: Started bridge0 Interface Master.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Unit is bound to inactive unit sys-subsystem-net-devices-bridge0.device. Stopping,
Feb 27 13:21:28 lapis systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Failed to enqueue stop job, ignoring: Transaction is destructive.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-wlp0s26u1u1.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-enp0s26u1u6.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-bond0.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-wlp0s26u1u2.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-wlp0s26u1u2.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-wlp0s26u1u1.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Unnecessary job for sys-subsystem-net-devices-enp0s26u1u6.device was removed.
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved bond0@bridge0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved wlp0s26u1u1@bond0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved wlp2s0@bond0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved enp0s26u1u6@bridge0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved enp3s0@bond0...
Feb 27 13:21:28 lapis systemd[1]: Starting wpa_supplicant on wlp2s0...
Feb 27 13:21:28 lapis systemd[1]: Starting Enslaved wlp0s26u1u2@bond0...
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Feb 27 13:21:28 lapis systemd[1]: Failed to start Enslaved wlp0s26u1u1@bond0.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Unit entered failed state.
Feb 27 13:21:28 lapis systemd[1]: [email protected]: Failed with result 'exit-code'.
...
Feb 27 13:21:29 lapis systemd[1]: Started Enslaved bond0@bridge0.
Feb 27 13:21:29 lapis systemd[1]: Started Enslaved wlp2s0@bond0.
Feb 27 13:21:29 lapis systemd[1]: Started Enslaved enp3s0@bond0.
Feb 27 13:21:29 lapis ip[394]: Cannot find device "enp0s26u1u6"
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Feb 27 13:21:29 lapis systemd[1]: Failed to start Enslaved enp0s26u1u6@bridge0.
Feb 27 13:21:29 lapis ip[386]: Cannot find device "wlp0s26u1u2"
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Unit entered failed state.
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Failed with result 'exit-code'.
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Feb 27 13:21:29 lapis systemd[1]: Failed to start Enslaved wlp0s26u1u2@bond0.
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Unit entered failed state.
Feb 27 13:21:29 lapis systemd[1]: [email protected]: Failed with result 'exit-code'.
...

And then, was there any useful change in functionality actually provided by commit cd8e857 ?

@thx1111
Copy link

thx1111 commented Mar 12, 2017

Do you want to close-out this pull request, and also close-out the bug reports,

"Requisite" dependency fails with "device" units - and with "After" ordering, produces "crazy" errors
#4756

"BindsTo=" must confirm that a device exists before enqueuing a start job on that device.
#4413

and open a new bug report for systemd 233? Or do we continue in this pull request? Or do we continue in one of the other bug reports?

crasu pushed a commit to crasu/magma that referenced this pull request Aug 16, 2022
The reason the reload failing is that some magma network devices do not
fully come up (e.g. do not get an ip even if set to auto).

This triggers this systemd/vagrant issue:
systemd/systemd#4733
fgrehm/vagrant-lxc#481

Signed-off-by: Christian Krämer <[email protected]>
crasu pushed a commit to crasu/magma that referenced this pull request Aug 16, 2022
… would fail.

The reason the reload failing is that some magma network devices do not fully come up
(e.g. do not get an ip even if set to auto).  This triggers this systemd/vagrant issue:
systemd/systemd#4733
fgrehm/vagrant-lxc#481

Signed-off-by: Christian Krämer <[email protected]>
crasu pushed a commit to magma/magma that referenced this pull request Aug 17, 2022
… would fail. (#13622)

The reason the reload failing is that some magma network devices do not fully come up
(e.g. do not get an ip even if set to auto).  This triggers this systemd/vagrant issue:
systemd/systemd#4733
fgrehm/vagrant-lxc#481

Signed-off-by: Christian Krämer <[email protected]>
voisey pushed a commit to voisey/magma that referenced this pull request Aug 18, 2022
… would fail.

The reason the reload failing is that some magma network devices do not fully come up
(e.g. do not get an ip even if set to auto).  This triggers this systemd/vagrant issue:
systemd/systemd#4733
fgrehm/vagrant-lxc#481

Signed-off-by: Christian Krämer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

6 participants