Skip to content

networkd: add new KeepConfiguration= setting#12511

Merged
keszybz merged 5 commits intosystemd:masterfrom
ssahani:high-avilability-12050
Jun 6, 2019
Merged

networkd: add new KeepConfiguration= setting#12511
keszybz merged 5 commits intosystemd:masterfrom
ssahani:high-avilability-12050

Conversation

@ssahani
Copy link
Contributor

@ssahani ssahani commented May 8, 2019

This looks pretty legit. networkd drops the foreign addresses upon
restart. This becomes a problem in cases where the additional address
is used as a virtual or floating IP in higih-available environments
(where a master IP is swapped to a different service upon failovers,
e.g. by services like Pacemaker, keepalived and alike, usually using
protocols like Corosync, VRRP or CARP). Those addresses are not handled
by systemd-networkd and are therefore purged upon DHCP renewals or
when an update triggers a restart of systemd-networkd, therefore
potentially resulting in downtimes or at least unecessary failovers.

closes #12050

@johannbg
Copy link
Contributor

johannbg commented May 8, 2019

@keszybz can you backport this fix for F29+ ( other downstream distribution maintainers should probably do so too ) once this lands, since this is somewhat of an urgent matter for systemd networkd only deployments as well as any network management stack that uses systemd-networkd ( nm/netplan etc ).

Downstreams/upstream BZ are probably full of misfiled bugs against keepalive, pacemaker etc related to this underlying issues ( and probably bz.rh will be filled with this against RHEL8 once the bussword summit is over )

@ssahani thanks for looking into this :)

@ssahani ssahani changed the title networkd: Allow networkd to work with keepalived / highavialbility networkd: Allow networkd to work with keepalived / high availability May 8, 2019
@yuwata yuwata added the network label May 8, 2019
@yuwata
Copy link
Member

yuwata commented May 9, 2019

I think this also fixes #11575.

@yuwata
Copy link
Member

yuwata commented May 9, 2019

So, the setting is useful not only for HA environment. How about moving CriticalConnection= from [DHCP] section to [Network] section, and introduce CriticalConnection=static (though the name static may be misleading, but I have no idea)?

  • CriticalConnection=yes: do not drop configs on start, and DHCP renew.
  • CriticalConnection=static: do not drop configs on start, but DHCP addresses are dropped on renew.
  • CriticalConnection=no: drop configs on start.

@ssahani
Copy link
Contributor Author

ssahani commented May 9, 2019

We can't move CriticalConnection= to [Network] section because of backward compatibility. We actually do not allow any other servers to configure same interface with networkd because they may cause lots of misconfiguration. But this case makes a tons of sense and describes the purpose HA. We might want to rename but not sure.

@yuwata
Copy link
Member

yuwata commented May 9, 2019

We can't move CriticalConnection= to [Network] section because of backward compatibility.

Of course, CriticalConnection= in [DHCP] still needs to be supported.

We actually do not allow any other servers to configure same interface with networkd because they may cause lots of misconfiguration. But this case makes a tons of sense and describes the purpose HA. We might want to rename but not sure.

I think the name CriticalConnection= already suggests it is something special. Of course, the man page needs to warn about that.

Anyway, apart from the name of the setting, I like this change.

@johannbg You do not need to wait the patch to be backported. Setting CriticalConnection=yes in [DHCP] section (even if you do not use DHCP) should work.

@yuwata
Copy link
Member

yuwata commented May 9, 2019

One reason I do not like the name HighAvailability= is that the setting does not provide HA, but just for HA systems.

@ssahani
Copy link
Contributor Author

ssahani commented May 9, 2019

AllowHighAvailability= ?

@johannbg
Copy link
Contributor

johannbg commented May 9, 2019

@yuwata I'm in midst of setting up test ipv4/ipv6 environment on F30 networkd only setup to test the classic HAProxy + Keepalived setup so I'll try it out but this will need to be backported regardless.

@ssahani Presumably in the future HA application stacks would want to have networkd manage the network settings ( and associated sysctl settings ) for them? ( I would not be surprised if there is very little things missing from networkd to do that )

If so it would need to be something that goes nicely with the rest of the [Network] section so something like below instead of HighAvailability= ( I personally think HighAvailability= is quite self explanatory thou )

Redundant=
Primary=/Secondary=
VIP=
FIP=
VirtualIP=
FloatingIP=
VirtualAddresss=
FloatingAddress=

@ssahani
Copy link
Contributor Author

ssahani commented May 9, 2019

If so it would need to be something that goes nicely with the rest of the [Network] section so something like below instead of HighAvailability= ( I personally think HighAvailability= is quite self explanatory thou )
Redundant=
Primary=/Secondary=
VIP=
FIP=
VirtualIP=
FloatingIP=
VirtualAddresss=
FloatingAddress=

Let's just make it generic and something comes in future also we need not add / remove stuff. Other application which have same requirements and we can't just keep on adding them no ?

@johannbg
Copy link
Contributor

johannbg commented May 9, 2019

Redundant=True is probably the most generic terminology which would cover HA amongst other things.
networkd probably requires an capability-based IPC support (dbus/kdbus/bus1 whatever ) before applications and application stacks can make full use of what networkd is capable of and make such future become a reality.

@ssahani
Copy link
Contributor Author

ssahani commented May 10, 2019

Redundant=True is probably the most generic terminology which would cover HA amongst other things.

Well I am not sure @yuwata WDUT?

networkd probably requires an capability-based IPC support (dbus/kdbus/bus1 whatever ) before applications and application stacks can make full use of what networkd is capable of and make such future become a reality.

yeah varlink just went in and we have dbus stuff to do.

pqarmitage added a commit to pqarmitage/keepalived that referenced this pull request May 13, 2019
Issue acassen#1170 identified that use_vmac didn't work with systemd-networkd
since systemd-networkd was removing IP addresses created by keepalived
(and any other application). It was discovered that systemd-networkd
did not remove IP addresses from ipvlans.

This commit adds support for ipvlans, but to work around the problem,
and because it might have other uses.

Systemd commit - systemd/systemd#12511 has added
configuration options to stop systemd-networkd removing IP addresses
added by other applications, but it is not merged yet, and it will be a
while before all the distros merge it.

Signed-off-by: Quentin Armitage <[email protected]>
@chr4
Copy link

chr4 commented May 13, 2019

In case it helps:

In my tests (although I'm using netplan), setting critical: true (which sets CriticalConnection=True according to the source) did not prevent purging the addresses upon a systemctl restart systemd-networkd.

network:
    version: 2
    ethernets:
        ens3:
            dhcp4: true
            critical: true
ip a a 1.2.3.4/32 dev eth0
systemctl restart systemd-networkd
ip a |grep 1.2.3.4

(Sorry, had no time to reproduce this using plain systemd-networkd.)

@ssahani ssahani force-pushed the high-avilability-12050 branch from be834e4 to b0fa0b4 Compare May 14, 2019 10:20
@keszybz
Copy link
Member

keszybz commented May 30, 2019

So, the setting is useful not only for HA environment.

Yes, this is not tied in any particular way to HA. The name should be generic.

How about moving CriticalConnection= from [DHCP] section to [Network] section, and introduce CriticalConnection=static

I think so too. Let's go one step futher and use a name that actually describes in plain words what this option does:
KeepConfiguration=yes|no|dhcp|static...,
where "yes" is the same as "dhcp, static" and KeepConfiguration=dhcp is the same as CriticalConnection=yes.

@yuwata
Copy link
Member

yuwata commented May 30, 2019

@ssahani Are you working on this? If not, I will try to implement the option @keszybz suggests.

@ssahani
Copy link
Contributor Author

ssahani commented May 31, 2019

yes I would like to work it over the weekend

@ssahani
Copy link
Contributor Author

ssahani commented Jun 2, 2019

I have updated PTAL @yuwata @keszybz .

@ssahani ssahani force-pushed the high-avilability-12050 branch from e0e2b82 to 0964ef8 Compare June 2, 2019 14:33
@yuwata yuwata changed the title networkd: Allow networkd to work with keepalived / high availability networkd: add new KeepConfiguration= setting Jun 2, 2019
@yuwata
Copy link
Member

yuwata commented Jun 3, 2019

Hmm. How about the following?

  • KeepConfiguration=dhcp-on-stop : do not stop clients on stopping networkd. Defaults to this for backward compatibility.
  • KeepConfiguration=dhcp : equivalent to CriticalConnection=yes. Including dhcp-on-stop.
  • KeepConfiguration=static : do not remove addresses and routes on starting up.
  • KeepConfiguration=yes : Including the above three options.

And then, parser of CriticalConnection= should be updated as an alias of KeepConfiguration=dhcp.

@ssahani If you are busy recently, then can I update your commit accordingly?

BTW, should we split dhcp into dhcp4 and dhcp6? WDYT?

@yuwata yuwata force-pushed the high-avilability-12050 branch from 0964ef8 to 17fb2d6 Compare June 3, 2019 04:21
@yuwata
Copy link
Member

yuwata commented Jun 3, 2019

@ssahani and @keszybz I've slightly updated @ssahani's commit and added several follow-ups. PTAL.

@ssahani Your original commit is saved at https://github.com/yuwata/systemd/tree/ssahani-high-availability-12050-backup.

@yuwata yuwata force-pushed the high-avilability-12050 branch 2 times, most recently from 4dcd2a5 to 160f237 Compare June 3, 2019 07:20
@ssahani
Copy link
Contributor Author

ssahani commented Jun 3, 2019

Thanks for working on this.

  • KeepConfiguration=dhcp-on-stop : do not stop clients on stopping networkd. Defaults to this for backward compatibility.

It's wrong . As soon as DHCP client stops those addresses are not valid any more. I have already explained it.

  • KeepConfiguration=dhcp : equivalent to CriticalConnection=yes. Including dhcp-on-stop.

Yes .

  • KeepConfiguration=static : do not remove addresses and routes on starting up.
  • KeepConfiguration=yes : Including the above three options.

Yes rest are OK

@yuwata
Copy link
Member

yuwata commented Jun 3, 2019

@ssahani Thank you for the comments.

  • KeepConfiguration=dhcp-on-stop : do not stop clients on stopping networkd. Defaults to this for backward compatibility.

It's wrong . As soon as DHCP client stops those addresses are not valid any more. I have already explained it.

Theoretically yes. However,

  • If we do not provide the option and 'fix' the bug, then many network dependent systems have to set KeepConfiguration=dhcp (or CriticalConnection=yes) when people needs to restart networkd. But setting KeepConfiguration=dhcp may introduce unwanted behavior. Or, in other words, we cannot safely restart networkd without the option.

  • The 'bug' causes real buggy situation almost only when networkd just stopped, not restarted. Then, the lease address may be re-assigned to another machine. However, I hope, the usual usecase of networkd is long running daemon, that is, the daemon is only restarted. not stopped so long time. So, the 'bug' merely causes problem.

  • Moreover, the 'bug' has existed for long time... So, I think we should keep the theoretical 'bug' by default.

@ssahani
Copy link
Contributor Author

ssahani commented Jun 3, 2019

Theoretically yes. However,

Not sure what is theoretically mean. It a BUG.

  • If we do not provide the option and 'fix' the bug, then many network dependent systems have to set KeepConfiguration=dhcp (or CriticalConnection=yes) when people needs to restart networkd. But setting KeepConfiguration=dhcp may introduce unwanted behavior. Or, in other words, we cannot safely restart networkd without the option.

It a right behaviour. Once networked stops it should drop the DHCP addresses.

  • The 'bug' causes real buggy situation almost only when networkd just stopped, not restarted. Then, the lease address may be re-assigned to another machine. However, I hope, the usual usecase of networkd is long running daemon, that is, the daemon is only restarted. not stopped so long time. So, the 'bug' merely causes problem.

How we can ensure that. This is a complete myth.

  • Moreover, the 'bug' has existed for long time... So, I think we should keep the theoretical 'bug' by default.

No It's a practical BUG.

@yuwata yuwata force-pushed the high-avilability-12050 branch from 160f237 to 02bb8ca Compare June 3, 2019 17:22
@yuwata
Copy link
Member

yuwata commented Jun 3, 2019

Now I understand that 'stopping DHCP client' != 'stopping networkd'. And I guess one reason why our discussion could not converge is I did not understand that. Sorry.

@ssahani Could you take a look once more? Now, KEEP_CONFIGURATION_DHCP_ON_STOP option is used only when stopping networkd, hopefully restarting networkd.

Of course, the current version still may not answer your request... But, I hope it is better than the previous one.

@ssahani
Copy link
Contributor Author

ssahani commented Jun 4, 2019

Yes precisely dynamic addresses are valid when we keep talking with the server. When the communication breaks we can not ensure those addresses are valid. Thanks this is better for sure. LGTM.

@yuwata yuwata force-pushed the high-avilability-12050 branch 2 times, most recently from 203e7ad to 96738c2 Compare June 4, 2019 11:01
@yuwata
Copy link
Member

yuwata commented Jun 4, 2019

Yey! All green now. @ssahani Thank you for the long discussion. @keszybz Could you take a final look?

ssahani and others added 4 commits June 6, 2019 22:50
The option prevents to drop lease address on stop.
By setting this, we can safely restart networkd.
Also, KeepConfiguration=dhcp drops static foreign addresses and routes.
@keszybz
Copy link
Member

keszybz commented Jun 6, 2019

I think this can go in. It is nice functionality that can be useful in various cases.

Nevertheless, I think that we should rework how DHCP addresses are handled. In particular, when restarting, we shouldn't drop any addresses before the restart. All state is serialized, so it's just a question of adjusting to new configuration after restart. I don't want users to set KeepConfiguration= just be able to restart sytemd-networkd painlessly.

I'll try to do the rebase now myself and force-push to your branch to redo the CI.

@yuwata yuwata force-pushed the high-avilability-12050 branch from 96738c2 to 1e49885 Compare June 6, 2019 14:37
@yuwata
Copy link
Member

yuwata commented Jun 6, 2019

Rebased.

@keszybz keszybz added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed needs-rebase labels Jun 6, 2019
@keszybz keszybz merged commit 08ed12b into systemd:master Jun 6, 2019
@keszybz keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Jun 6, 2019
@ssahani ssahani deleted the high-avilability-12050 branch August 20, 2019 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

systemd-networkd prunes floating virtual IPs for high-availability environments

6 participants