Skip to content
This repository was archived by the owner on Jan 30, 2020. It is now read-only.

Conversation

@hectorj2f
Copy link
Contributor

Unit state cache:

Motivation: Inactive units induce an overload to systemd-dbus communication which linearly increases proportionally to the amount of existing inactive units. Therefore, we decided to research about solutions to minimize the workload of fleet caused by this issue. We decided to build a cache to reduce the amount of calls from systemd to dbus. Fleet periodically performs operations to gather the state of the units in the cluster, so for each inactive unit located in a host, a read operation is triggered to obtain the state from dbus. Therefore, you could have an impression of how this operation could use more resources than the desired.

Nevertheless, we thought about another optimal solution during the development of this cache. This solution requires to modify the go-dbus library implementation to provide an event-driven mechanism. This event-driven mechanism would notify every time an operation affects the state of a unit. These notifications would enable to know when the state of a unit change without having to constantly query systemd to get its state. Obviously we declined this solution as it required the modification of third party software used by fleet.

Implementation

We added a new flag enable_unitstate_cache that will activate the usage of the unit state cache and therefore to minimize the usage of the system resources. The unit state cache is used to give the state of inactive units whenever no operation has been recently triggered over them. Otherwise the system will query to dbus in order to get its final state. Consequently, we register operations that are being triggered over the units in order to decide whether or not to use its cached state.

Test scenario:

  • 3 node cluster
  • fleet version 0.11.3
  • CoreOS stable (835.9.0)
  • etcd version 2.2.1

Metrics analyzed:

  • CPU usage
  • Memory usage
  • FS bytes usage
  • NET bytes IN
  • NET bytes OUT
  • NET bytes TOTAL
  • Capacity used

Savings deducted from these experiments:

A brief and quick summary:

  • There is a clear CPU reduction using the unit state cache. In the following experiments, this benefit can be identified as approx. 10% CPU usage reduction. However this amount will increase/decrease based on the amount of inactive units.
  • There is a clear Memory usage reduction using the unit state cache. In the following experiments, this benefit is up to approx. 100Mb Memory usage reduction. However this amount will increase/decrease based on the amount of inactive units.
  • FS bytes used during the experiments slightly INCREASED with the unit state implementation.
  • NET bytes IN/OUT used during the experiments slightly INCREASED in the implementation WITHOUT unit cache.
  • Capacity used obviously decreased when using the unit state cache.

Experiment 1:

Without any iteration in the cluster, we evaluated how 3 inactive units could use the system resources of a host.

CPU usage of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 6 06 52 pm

#### Memory usage of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 6 07 11 pm

#### CPU usage WITHOUT unit state cache:

screen shot 2016-01-21 at 6 04 31 pm

#### Memory usage WITHOUT unit state cache:

screen shot 2016-01-21 at 6 04 59 pm

As shown above, there is a slight difference caused by these periodic queries to dbus in order to get the state of inactive units.

Experiment 2:

We used a benchmarking tool that starts 900 units in the cluster, and consequently stop+destroy all after a timeout.

CPU usage of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 5 39 26 pm

#### CPU usage WITHOUT unit state cache: ##

screen shot 2016-01-21 at 6 15 52 pm

### Memory usage of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 3 59 26 pm

### Memory usage WITHOUT unit state cache: ##

screen shot 2016-01-21 at 6 15 38 pm

### FS bytes used of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 3 59 37 pm

### FS bytes used WITHOUT unit state cache: ##

screen shot 2016-01-21 at 3 57 13 pm

### Capacity usage of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 3 59 44 pm

### Capacity used WITHOUT unit state cache: ##

screen shot 2016-01-21 at 3 57 21 pm

### Net bytes IN of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 3 59 54 pm

### Net bytes IN WITHOUT unit state cache: ##

screen shot 2016-01-21 at 3 56 42 pm

### Net bytes OUT of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 4 00 01 pm

### Net bytes OUT WITHOUT unit state cache:

screen shot 2016-01-21 at 3 56 50 pm


Net bytes total of our implementation using an UNIT STATE CACHE:

screen shot 2016-01-21 at 4 00 10 pm

### Net bytes total WITHOUT unit state cache:

screen shot 2016-01-21 at 6 38 58 pm

NOTE: As shown in above Figures (CPU usage, Mem usage, ...) there are two spikes that represent the start operation and the stop/destroy operation which is performed after a fixed timeout.

@hectorj2f
Copy link
Contributor Author

I performed a different experiment, this time we had 6 inactive units per host with a total of 18 inactive units in the whole cluster. We used our benchmarking tool called fleemmer (as done above) and deployed 900 units, which were stopped/destroy after a fixed timeout.

CPU usage WITHOUT unit state cache:

screen shot 2016-01-25 at 11 23 26 am

CPU usage using our implementation of an UNIT STATE CACHE:

screen shot 2016-01-25 at 12 13 04 pm

CONCLUSION: There is a slight increment in the CPU usage due to the inactive units approx. 7-8% CPU usage increment. However, the CPU usage always increases in a higher percentage when scheduling units how shown in the spikes.

@jonboulle jonboulle added this to the v0.13.0 milestone Mar 1, 2016
@tixxdz
Copy link
Contributor

tixxdz commented Mar 11, 2016

@hectorj2f thank you very much for the detailed information, we will read it again carefully and try to push it forward.

For the moment I'm just a bit curious, you say "We decided to build a cache to reduce the amount of calls from systemd to dbus. Fleet periodically performs operations to gather the state of the units in the cluster, so for each inactive unit located in a host, a read operation is triggered to obtain the state from dbus. Therefore, you could have an impression of how this operation could use more resources than the desired." and later you also say that it can be solved by updating go-dbus, for me that's the best solution. Your patches seem clean and they don't touch lot of code. However I would first ask if we can support notification from systemd ? subscribe() then get UnitNew() and UnitRemoved() signals https://www.freedesktop.org/wiki/Software/systemd/dbus/ systemd supports notification and it should not trigger any load at all... is this currently not possible ?

Thank you!

@hectorj2f
Copy link
Contributor Author

@tixxdz Subscriptions are already used in the function GetUnitStates() of systemd/manager.go. In the code, fleet defines subscribers for any instance to get its status update, e.g. when running async operations. However those subscribers are just for loops that use systemd-dbus to get the current state (look at SubscribeUnitsCustom function in go-dbus). This is done to be aware of any state change.

The dbus Subscribe() function you mentioned, it reacts against any systemd dbus event. I believe it is not what we would want. We want to define subscribers for specific fleet units, (similar to what subscriptor set offers) but in a smart way. The subscribers should be asynchronous to report whenever a state changes instead of done via for loops querying all the states and filtering the units.

Indeed, my long-term proposal was to refactor go-dbus to report state updates only when the state changes and not by constantly querying its state.

@jonboulle
Copy link
Contributor

For a little history: this was previously purely event-driven - similar to how we previously used etcd - but as with etcd, we could end up missing events and have no way to sanely recover, so we ended up moving this to the reconciliation model as well.

@hectorj2f
Copy link
Contributor Author

@jonboulle Thanks for the information :). Then it makes more sense to me cause it made it better regarding data consistency and failure recovery. Unfortunately we also found some situations on which fleet had a different state than the one in systemd, but that is another issue.

@jonboulle Do you think the current solution could have similar problems ? or do you refer to the implementation of dbus watchers ?

This PR aims to reduce the workload caused by this periodic unit state calls for all the subscribed units. @tixxdz the state of the subscribed units is collected via dbus ListUnits calls. Moreover for all those units that don't appear in the ListUnits call, an explicit getProperties dbus call is triggered for each unit.

@hectorj2f
Copy link
Contributor Author

Do you see any blocker or potential issue coming out when using this caching approach @jonboulle ?

@tixxdz
Copy link
Contributor

tixxdz commented Mar 21, 2016

Hi @hectorj2f
In the mean time could you please force push your branch: giantswarm:unit_state_cache ? so the pull request is updated which will trigger the functional tests.

Thank you!

@hectorj2f
Copy link
Contributor Author

@tixxdz Done. However there is an error that seems to be unrelated :/ .
https://travis-ci.org/coreos/fleet/jobs/117506419

@hectorj2f
Copy link
Contributor Author

ping @tixxdz

@tixxdz
Copy link
Contributor

tixxdz commented Mar 23, 2016

@hectorj2f yes thank you for the update! the failing is due to travis-ci. Hopefully we will give it more time next week as it is in the 0.13 priority.

Thanks!

@tixxdz tixxdz self-assigned this Mar 30, 2016
@hectorj2f
Copy link
Contributor Author

@tixxdz test are passing, what's next ?

@antrik
Copy link
Contributor

antrik commented Apr 6, 2016

@hectorj2f to be honest, I think the main hurdle is that we are still pretty new to this, and it's hard for any of us to judge whether such a change is indeed a good idea... (While @jonboulle doesn't really have the time to think about this too much I guess.)

I haven't really looked into this myself; one thing that comes to mind though is that AIUI the cache gets out of sync if a unit state changes because of "external" factors, not at a direct request of fleet?... How does that deal with crashing units for example, or unit dependencies fleet is not aware of?

BTW, this is a bit tangential, but there is one bit in your original message I do not agree with: "Obviously we declined this solution as it required the modification of third party software used by fleet." To quote a certain Lennart Poettering: "You own the entire stack." This is free software -- you never have to take limitations in underlying infrastructure as granted, and work around them. Fix problems at the source.

@tixxdz
Copy link
Contributor

tixxdz commented Apr 6, 2016

@hectorj2f hi sorry for delay, we had to handle some other issues, at the same time I'm reviewing this issue #478 since it relates the one exposed here, just trying to get the whole context from your comments and @jonboulle ones ;-)

@hectorj2f
Copy link
Contributor Author

@tixxdz Thanks

How does that deal with crashing units for example, or unit dependencies fleet is not aware of?

@antrik Well, that has a very simple answer. This cache won't interfere as the state of the systemd units remains with the same logic than before. Please have a look on line 242 in systemd/manager.go. This cache aims to reduce the workload of the inactive units.

fleetd/fleetd.go Outdated
cfgset.Bool("disable_watches", false, "Disable the use of etcd watches. Increases scheduling latency")
cfgset.Bool("verify_units", false, "DEPRECATED - This option is ignored")
cfgset.String("authorized_keys_file", "", "DEPRECATED - This option is ignored")
cfgset.Bool("enable_unitstate_cache", true, "Enable an unit state cache to minimize the systemd and dbus communication overhead.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be made false by default ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes at least when it's merged we let it false by default others can run it if they want and after some time it can be switch to true. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @tixxdz

@hectorj2f
Copy link
Contributor Author

@tixxdz PRs seem to fail. Please, ping me when it is fixed to push it again. Thx

@tixxdz
Copy link
Contributor

tixxdz commented Apr 18, 2016

@hectorj2f just an update, we are making the functional tests run when state cache is set and not, will merge the functional test change first, then later this PR. I may add some minor comments.

Thank you!

@tixxdz
Copy link
Contributor

tixxdz commented Apr 18, 2016

So more notes:

  1. Did you investigate if you put the transition of states inside say
    systemdUnitManager.cache_hashes { hashes map[string]unit.Hash ; state string => {active or inactive }

So we have a new cache_hashes with two fields where we unload the unit we don't remove it but we put the state into the inactive state, the other code should check if the state is always active and the addition should check if the state is inactive. This will allow us to use the same cache but in the other way I'm not sure if it will really scale ? cause with the current implementation we just forget about the old cache https://github.com/coreos/fleet/pull/1418/files#diff-561bbb797eb2f941c1384271efe28c0bR273 which will be collected and the registerUnitOperation cache is also updated where it's appropriate, this of course saves memory but at the end we may have three hashes running at the same time, may be we can achieve the same with two or one hash which will survive the calls; one used inside the functions as a temporary hash to update the final one ?

  1. Would it be possible to have some benchmark comparison with more inactive units, a higher number ? Just to make sure before merging.

  2. It would be nice if we put the details of the logic inside functions, say instead of:
    if m.enableUnitStateCache { delete(m.unitStateCache.registerUnitOperation, name) }

func (m *systemdUnitManager) cacheUpdateActiveUnits(blabla....) { if !m.enableUnitStateCache { return } .... continue ....}

It will allows us to hide the logic and maybe improve the cache later ?

So 1) needs further thinking, do you think it makes sense ? I need more time to came up with something that can work, otherwise just use what's already here and improve it later.

And for 2) and 3) it would be awesome to update the PR based on these! and just matter of time before we merge this ;-)

Thanks for your patience.

@tixxdz
Copy link
Contributor

tixxdz commented Apr 18, 2016

Also, squashing the commits where it's possible would be nice. btw we merged more functional tests for unit states these past days, I guess that's enough or do you think that there are corner cases that can be added to assert the functionality ?

@hectorj2f
Copy link
Contributor Author

@tixxdz I like the idea but I need to think about the corner cases.

Regarding 2) and 3), they are feasible and understandable :).

@tixxdz
Copy link
Contributor

tixxdz commented Apr 21, 2016

@hectorj2f for 1) if it's too much, then please don't bother. Without it, the feature will be an option disabled by default, later if it can be done then it could be enabled by default for every one. At the same time if you use or depend on this PR #1556 would be nice to test it, confirm.

Thank you!

@kayrus
Copy link
Contributor

kayrus commented Apr 21, 2016

@hectorj2f I'm not quite sure, but I suppose this improvement should be done in low-level side. I've started to implement systemd-dbus optimizations withing this PR: #1564

@hectorj2f
Copy link
Contributor Author

@kayrus My concern are mainly the inactive - dead units. I'll test your changes at the end of this week, and let you to know if there are any improvements. I am currently quite busy with other stuff :-/.

@kayrus
Copy link
Contributor

kayrus commented Apr 28, 2016

@hectorj2f I've reworked my patch a bit and created systemd pull request. If you wish to test systemd v225 which is currectly used by stable CoreOS, please use this patch: endocode/systemd@1fdb12b

I'll try to implement new d-bus method which will optimize getUnitState soon.

kayrus added a commit to endocode/systemd that referenced this pull request May 3, 2016
Instead of retrieving all properties for each units, we will pass list of
units to GetUnitsStates method and receive units' states.
Resolves coreos/fleet#1418
kayrus added a commit to endocode/systemd that referenced this pull request May 11, 2016
This new method returns information by unit names. Instead of ListUnitsByPatterns
this method returns information of inactive and even unexisting units.
Moved dbus unit reply logic into a separate shared function.
Resolves coreos/fleet#1418
kayrus added a commit to endocode/systemd that referenced this pull request May 11, 2016
This new method returns information by unit names. Instead of ListUnitsByPatterns
this method returns information of inactive and even unexisting units.
Moved dbus unit reply logic into a separate shared function.
Resolves coreos/fleet#1418
@hectorj2f
Copy link
Contributor Author

After talking with @kayrus, I think this PR might get obsolete once #1564 is merged. I'd like to test it once that is done.

@jonboulle
Copy link
Contributor

Let's consider this on hold in favour of #1564

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants