Add unit state cache to mitigate the iteration between fleet and systemd - dbus #1418

hectorj2f · 2016-02-02T09:12:11Z

Unit state cache:

Motivation: Inactive units induce an overload to systemd-dbus communication which linearly increases proportionally to the amount of existing inactive units. Therefore, we decided to research about solutions to minimize the workload of fleet caused by this issue. We decided to build a cache to reduce the amount of calls from systemd to dbus. Fleet periodically performs operations to gather the state of the units in the cluster, so for each inactive unit located in a host, a read operation is triggered to obtain the state from dbus. Therefore, you could have an impression of how this operation could use more resources than the desired.

Nevertheless, we thought about another optimal solution during the development of this cache. This solution requires to modify the go-dbus library implementation to provide an event-driven mechanism. This event-driven mechanism would notify every time an operation affects the state of a unit. These notifications would enable to know when the state of a unit change without having to constantly query systemd to get its state. Obviously we declined this solution as it required the modification of third party software used by fleet.

Implementation

We added a new flag enable_unitstate_cache that will activate the usage of the unit state cache and therefore to minimize the usage of the system resources. The unit state cache is used to give the state of inactive units whenever no operation has been recently triggered over them. Otherwise the system will query to dbus in order to get its final state. Consequently, we register operations that are being triggered over the units in order to decide whether or not to use its cached state.

Test scenario:

3 node cluster
fleet version 0.11.3
CoreOS stable (835.9.0)
etcd version 2.2.1

Metrics analyzed:

CPU usage
Memory usage
FS bytes usage
NET bytes IN
NET bytes OUT
NET bytes TOTAL
Capacity used

Savings deducted from these experiments:

A brief and quick summary:

There is a clear CPU reduction using the unit state cache. In the following experiments, this benefit can be identified as approx. 10% CPU usage reduction. However this amount will increase/decrease based on the amount of inactive units.
There is a clear Memory usage reduction using the unit state cache. In the following experiments, this benefit is up to approx. 100Mb Memory usage reduction. However this amount will increase/decrease based on the amount of inactive units.
FS bytes used during the experiments slightly INCREASED with the unit state implementation.
NET bytes IN/OUT used during the experiments slightly INCREASED in the implementation WITHOUT unit cache.
Capacity used obviously decreased when using the unit state cache.

Experiment 1:

Without any iteration in the cluster, we evaluated how 3 inactive units could use the system resources of a host.

CPU usage of our implementation using an UNIT STATE CACHE:

#### Memory usage of our implementation using an UNIT STATE CACHE:

#### CPU usage WITHOUT unit state cache:

#### Memory usage WITHOUT unit state cache:

As shown above, there is a slight difference caused by these periodic queries to dbus in order to get the state of inactive units.

Experiment 2:

We used a benchmarking tool that starts 900 units in the cluster, and consequently stop+destroy all after a timeout.

CPU usage of our implementation using an UNIT STATE CACHE:

#### CPU usage WITHOUT unit state cache: ##

### Memory usage of our implementation using an UNIT STATE CACHE:

### Memory usage WITHOUT unit state cache: ##

### FS bytes used of our implementation using an UNIT STATE CACHE:

### FS bytes used WITHOUT unit state cache: ##

### Capacity usage of our implementation using an UNIT STATE CACHE:

### Capacity used WITHOUT unit state cache: ##

### Net bytes IN of our implementation using an UNIT STATE CACHE:

### Net bytes IN WITHOUT unit state cache: ##

### Net bytes OUT of our implementation using an UNIT STATE CACHE:

### Net bytes OUT WITHOUT unit state cache:

Net bytes total of our implementation using an UNIT STATE CACHE:

### Net bytes total WITHOUT unit state cache:

NOTE: As shown in above Figures (CPU usage, Mem usage, ...) there are two spikes that represent the start operation and the stop/destroy operation which is performed after a fixed timeout.

hectorj2f · 2016-02-02T09:46:31Z

I performed a different experiment, this time we had 6 inactive units per host with a total of 18 inactive units in the whole cluster. We used our benchmarking tool called fleemmer (as done above) and deployed 900 units, which were stopped/destroy after a fixed timeout.

CPU usage WITHOUT unit state cache:

CPU usage using our implementation of an UNIT STATE CACHE:

CONCLUSION: There is a slight increment in the CPU usage due to the inactive units approx. 7-8% CPU usage increment. However, the CPU usage always increases in a higher percentage when scheduling units how shown in the spikes.

tixxdz · 2016-03-11T10:18:42Z

@hectorj2f thank you very much for the detailed information, we will read it again carefully and try to push it forward.

For the moment I'm just a bit curious, you say "We decided to build a cache to reduce the amount of calls from systemd to dbus. Fleet periodically performs operations to gather the state of the units in the cluster, so for each inactive unit located in a host, a read operation is triggered to obtain the state from dbus. Therefore, you could have an impression of how this operation could use more resources than the desired." and later you also say that it can be solved by updating go-dbus, for me that's the best solution. Your patches seem clean and they don't touch lot of code. However I would first ask if we can support notification from systemd ? subscribe() then get UnitNew() and UnitRemoved() signals https://www.freedesktop.org/wiki/Software/systemd/dbus/ systemd supports notification and it should not trigger any load at all... is this currently not possible ?

Thank you!

hectorj2f · 2016-03-14T19:33:33Z

@tixxdz Subscriptions are already used in the function GetUnitStates() of systemd/manager.go. In the code, fleet defines subscribers for any instance to get its status update, e.g. when running async operations. However those subscribers are just for loops that use systemd-dbus to get the current state (look at SubscribeUnitsCustom function in go-dbus). This is done to be aware of any state change.

The dbus Subscribe() function you mentioned, it reacts against any systemd dbus event. I believe it is not what we would want. We want to define subscribers for specific fleet units, (similar to what subscriptor set offers) but in a smart way. The subscribers should be asynchronous to report whenever a state changes instead of done via for loops querying all the states and filtering the units.

Indeed, my long-term proposal was to refactor go-dbus to report state updates only when the state changes and not by constantly querying its state.

jonboulle · 2016-03-16T10:51:16Z

For a little history: this was previously purely event-driven - similar to how we previously used etcd - but as with etcd, we could end up missing events and have no way to sanely recover, so we ended up moving this to the reconciliation model as well.

hectorj2f · 2016-03-16T18:13:54Z

@jonboulle Thanks for the information :). Then it makes more sense to me cause it made it better regarding data consistency and failure recovery. Unfortunately we also found some situations on which fleet had a different state than the one in systemd, but that is another issue.

@jonboulle Do you think the current solution could have similar problems ? or do you refer to the implementation of dbus watchers ?

This PR aims to reduce the workload caused by this periodic unit state calls for all the subscribed units. @tixxdz the state of the subscribed units is collected via dbus ListUnits calls. Moreover for all those units that don't appear in the ListUnits call, an explicit getProperties dbus call is triggered for each unit.

hectorj2f · 2016-03-17T18:19:22Z

Do you see any blocker or potential issue coming out when using this caching approach @jonboulle ?

tixxdz · 2016-03-21T15:44:13Z

Hi @hectorj2f
In the mean time could you please force push your branch: giantswarm:unit_state_cache ? so the pull request is updated which will trigger the functional tests.

Thank you!

hectorj2f · 2016-03-21T18:21:08Z

@tixxdz Done. However there is an error that seems to be unrelated :/ .
https://travis-ci.org/coreos/fleet/jobs/117506419

hectorj2f · 2016-03-22T16:56:12Z

ping @tixxdz

tixxdz · 2016-03-23T12:13:28Z

@hectorj2f yes thank you for the update! the failing is due to travis-ci. Hopefully we will give it more time next week as it is in the 0.13 priority.

Thanks!

hectorj2f · 2016-04-06T10:50:27Z

@tixxdz test are passing, what's next ?

antrik · 2016-04-06T13:03:32Z

@hectorj2f to be honest, I think the main hurdle is that we are still pretty new to this, and it's hard for any of us to judge whether such a change is indeed a good idea... (While @jonboulle doesn't really have the time to think about this too much I guess.)

I haven't really looked into this myself; one thing that comes to mind though is that AIUI the cache gets out of sync if a unit state changes because of "external" factors, not at a direct request of fleet?... How does that deal with crashing units for example, or unit dependencies fleet is not aware of?

BTW, this is a bit tangential, but there is one bit in your original message I do not agree with: "Obviously we declined this solution as it required the modification of third party software used by fleet." To quote a certain Lennart Poettering: "You own the entire stack." This is free software -- you never have to take limitations in underlying infrastructure as granted, and work around them. Fix problems at the source.

tixxdz · 2016-04-06T14:46:02Z

@hectorj2f hi sorry for delay, we had to handle some other issues, at the same time I'm reviewing this issue #478 since it relates the one exposed here, just trying to get the whole context from your comments and @jonboulle ones ;-)

hectorj2f · 2016-04-06T16:21:20Z

@tixxdz Thanks

How does that deal with crashing units for example, or unit dependencies fleet is not aware of?

@antrik Well, that has a very simple answer. This cache won't interfere as the state of the systemd units remains with the same logic than before. Please have a look on line 242 in systemd/manager.go. This cache aims to reduce the workload of the inactive units.

tixxdz · 2016-04-16T05:19:59Z

fleetd/fleetd.go

 	cfgset.Bool("disable_watches", false, "Disable the use of etcd watches. Increases scheduling latency")
 	cfgset.Bool("verify_units", false, "DEPRECATED - This option is ignored")
 	cfgset.String("authorized_keys_file", "", "DEPRECATED - This option is ignored")
+	cfgset.Bool("enable_unitstate_cache", true, "Enable an unit state cache to minimize the systemd and dbus communication overhead.")


Could this be made false by default ?

Yes at least when it's merged we let it false by default others can run it if they want and after some time it can be switch to true. Thanks

Done @tixxdz

hectorj2f · 2016-04-18T09:40:28Z

@tixxdz PRs seem to fail. Please, ping me when it is fixed to push it again. Thx

tixxdz · 2016-04-18T11:49:20Z

@hectorj2f just an update, we are making the functional tests run when state cache is set and not, will merge the functional test change first, then later this PR. I may add some minor comments.

Thank you!

tixxdz · 2016-04-18T13:27:26Z

So more notes:

Did you investigate if you put the transition of states inside say
systemdUnitManager.cache_hashes { hashes map[string]unit.Hash ; state string => {active or inactive }

So we have a new cache_hashes with two fields where we unload the unit we don't remove it but we put the state into the inactive state, the other code should check if the state is always active and the addition should check if the state is inactive. This will allow us to use the same cache but in the other way I'm not sure if it will really scale ? cause with the current implementation we just forget about the old cache https://github.com/coreos/fleet/pull/1418/files#diff-561bbb797eb2f941c1384271efe28c0bR273 which will be collected and the registerUnitOperation cache is also updated where it's appropriate, this of course saves memory but at the end we may have three hashes running at the same time, may be we can achieve the same with two or one hash which will survive the calls; one used inside the functions as a temporary hash to update the final one ?

Would it be possible to have some benchmark comparison with more inactive units, a higher number ? Just to make sure before merging.
It would be nice if we put the details of the logic inside functions, say instead of:
if m.enableUnitStateCache { delete(m.unitStateCache.registerUnitOperation, name) }

func (m *systemdUnitManager) cacheUpdateActiveUnits(blabla....) { if !m.enableUnitStateCache { return } .... continue ....}

It will allows us to hide the logic and maybe improve the cache later ?

So 1) needs further thinking, do you think it makes sense ? I need more time to came up with something that can work, otherwise just use what's already here and improve it later.

And for 2) and 3) it would be awesome to update the PR based on these! and just matter of time before we merge this ;-)

Thanks for your patience.

tixxdz · 2016-04-18T13:46:54Z

Also, squashing the commits where it's possible would be nice. btw we merged more functional tests for unit states these past days, I guess that's enough or do you think that there are corner cases that can be added to assert the functionality ?

hectorj2f · 2016-04-21T08:02:00Z

@tixxdz I like the idea but I need to think about the corner cases.

Regarding 2) and 3), they are feasible and understandable :).

tixxdz · 2016-04-21T14:45:28Z

@hectorj2f for 1) if it's too much, then please don't bother. Without it, the feature will be an option disabled by default, later if it can be done then it could be enabled by default for every one. At the same time if you use or depend on this PR #1556 would be nice to test it, confirm.

Thank you!

kayrus · 2016-04-21T15:25:20Z

@hectorj2f I'm not quite sure, but I suppose this improvement should be done in low-level side. I've started to implement systemd-dbus optimizations withing this PR: #1564

hectorj2f · 2016-04-26T09:18:58Z

@kayrus My concern are mainly the inactive - dead units. I'll test your changes at the end of this week, and let you to know if there are any improvements. I am currently quite busy with other stuff :-/.

kayrus · 2016-04-28T18:16:03Z

@hectorj2f I've reworked my patch a bit and created systemd pull request. If you wish to test systemd v225 which is currectly used by stable CoreOS, please use this patch: endocode/systemd@1fdb12b

I'll try to implement new d-bus method which will optimize getUnitState soon.

Instead of retrieving all properties for each units, we will pass list of units to GetUnitsStates method and receive units' states. Resolves coreos/fleet#1418

This new method returns information by unit names. Instead of ListUnitsByPatterns this method returns information of inactive and even unexisting units. Moved dbus unit reply logic into a separate shared function. Resolves coreos/fleet#1418

hectorj2f · 2016-05-11T16:32:14Z

After talking with @kayrus, I think this PR might get obsolete once #1564 is merged. I'd like to test it once that is done.

jonboulle · 2016-05-18T14:26:51Z

Let's consider this on hold in favour of #1564

jonboulle added needs review area/performance technology/systemd labels Feb 2, 2016

jonboulle added this to the v0.13.0 milestone Mar 1, 2016

hectorj2f force-pushed the unit_state_cache branch from bbfca9a to dc449f3 Compare March 21, 2016 17:22

hectorj2f force-pushed the unit_state_cache branch from dc449f3 to e905d4b Compare March 28, 2016 08:49

tixxdz self-assigned this Mar 30, 2016

hectorj2f force-pushed the unit_state_cache branch from e905d4b to d0221a8 Compare April 1, 2016 15:14

tixxdz reviewed Apr 16, 2016
View reviewed changes

hectorj2f force-pushed the unit_state_cache branch from e37c1b4 to e03de2e Compare April 16, 2016 08:49

tixxdz added the priority/P2 label Apr 16, 2016

hectorj2f added 3 commits April 18, 2016 12:40

Add unit state cache in systemd manager

eea3a41

disable unit state cache by default

7360486

propagate errors of getUnitState()

76990b6

hectorj2f force-pushed the unit_state_cache branch from e03de2e to 76990b6 Compare April 18, 2016 10:40

kayrus mentioned this pull request Apr 18, 2016

tests: test fleetd with extra configuration options #1556

Merged

tixxdz added reviewed/needs rework and removed needs review labels Apr 19, 2016

kayrus mentioned this pull request Apr 28, 2016

core: Filter by unit name behind the D-Bus, instead on the client side systemd/systemd#3142

Merged

tixxdz mentioned this pull request May 2, 2016

filter ListUnits dbus method to subscribed units #478

Closed

kayrus added a commit to endocode/systemd that referenced this pull request May 3, 2016

core: added GetUnitsStates dbus method

ec683a7

Instead of retrieving all properties for each units, we will pass list of units to GetUnitsStates method and receive units' states. Resolves coreos/fleet#1418

kayrus mentioned this pull request May 3, 2016

core: added ListUnitsByNames dbus method systemd/systemd#3182

Merged

jonboulle added priority/Pmaybe and removed priority/P2 labels May 18, 2016

jonboulle modified the milestones: v1.0.0, v0.13.0 May 18, 2016

jonboulle assigned dongsupark and unassigned tixxdz May 18, 2016

dongsupark force-pushed the master branch from 0338168 to 56b539f Compare May 23, 2016 17:57

kayrus closed this in endocode/systemd@6900c74 May 24, 2016

kayrus mentioned this pull request May 25, 2016

fleetd: get filtered list of units instead of all units #1564

Merged

Add unit state cache to mitigate the iteration between fleet and systemd - dbus #1418

Add unit state cache to mitigate the iteration between fleet and systemd - dbus #1418

Uh oh!

Conversation

hectorj2f commented Feb 2, 2016

Implementation

Test scenario:

Metrics analyzed:

Savings deducted from these experiments:

Experiment 1:

CPU usage of our implementation using an UNIT STATE CACHE:

Experiment 2:

CPU usage of our implementation using an UNIT STATE CACHE:

Net bytes total of our implementation using an UNIT STATE CACHE:

Uh oh!

hectorj2f commented Feb 2, 2016

CPU usage WITHOUT unit state cache:

CPU usage using our implementation of an UNIT STATE CACHE:

Uh oh!

tixxdz commented Mar 11, 2016

Uh oh!

hectorj2f commented Mar 14, 2016

Uh oh!

jonboulle commented Mar 16, 2016

Uh oh!

hectorj2f commented Mar 16, 2016

Uh oh!

hectorj2f commented Mar 17, 2016

Uh oh!

tixxdz commented Mar 21, 2016

Uh oh!

hectorj2f commented Mar 21, 2016

Uh oh!

hectorj2f commented Mar 22, 2016

Uh oh!

tixxdz commented Mar 23, 2016

Uh oh!

hectorj2f commented Apr 6, 2016

Uh oh!

antrik commented Apr 6, 2016

Uh oh!

tixxdz commented Apr 6, 2016

Uh oh!

hectorj2f commented Apr 6, 2016

Uh oh!

tixxdz Apr 16, 2016

Choose a reason for hiding this comment

Uh oh!

tixxdz Apr 16, 2016

Choose a reason for hiding this comment

Uh oh!

hectorj2f Apr 16, 2016

Choose a reason for hiding this comment

Uh oh!

hectorj2f commented Apr 18, 2016

Uh oh!

tixxdz commented Apr 18, 2016

Uh oh!

tixxdz commented Apr 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tixxdz commented Apr 18, 2016

Uh oh!

hectorj2f commented Apr 21, 2016

Uh oh!

tixxdz commented Apr 21, 2016

Uh oh!

kayrus commented Apr 21, 2016

Uh oh!

hectorj2f commented Apr 26, 2016

Uh oh!

kayrus commented Apr 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hectorj2f commented May 11, 2016

Uh oh!

jonboulle commented May 18, 2016

Uh oh!

Reviewers

Assignees

tixxdz commented Apr 18, 2016 •

edited

Loading

kayrus commented Apr 28, 2016 •

edited

Loading