endpoint: stop regenerating all endpoints on every identity allocation; switch to periodic regens instead.#35815
Conversation
|
FYI @jrajahalme, I feel safe making this change now that you fixed the leak that caused Envoy to miss some incremental updates. |
fa1ba51 to
d08cca2
Compare
|
Could we end up in future reintroducing a bug where Cilium never gets around to reconciling the proper policy state? The primary reason for doing this regeneration sometimes was to make sure that there's a full recalculation fallback in case some incremental logic was incorrect. Although I think that we shouldn't have to do a bunch of wasteful additional work every time there is a change in the policy engine overall, I think it's interesting to consider what our resiliency mechanisms should look like in order to mitigate such risks. |
Of course, that's absolutely possible. I'm less concerned about policy state, but rather everything else that may be depending on periodic endpoint regeneration. I've given the code a look-through, and it seems OK, which is worth approximately nothing in terms of reassurance :). I agree that, in the interest of resilience, we should probably periodically regenerate endpoints. Ideally we could detect changes and warn on them, but given that the overlying state may have coincidentally changed, this would be a very noisy signal. Using identity allocation as a trigger for occasional endpoint regeneration is not a good timing signal. Noisy clusters will regenerate constantly, where clusters with a static set of identities will not -- even when regeneration-relevant configuration may have changed. So, I can add a periodic regeneration controller to the EndpointManager. That would be more useful for resiliency, and be a potentially significant reduction in CPU usage for certain workloads. |
d08cca2 to
468f229
Compare
This comment was marked as resolved.
This comment was marked as resolved.
468f229 to
1337cb2
Compare
This comment was marked as resolved.
This comment was marked as resolved.
1337cb2 to
375beff
Compare
375beff to
39288d0
Compare
39288d0 to
495980a
Compare
e651382 to
dbb51c7
Compare
|
/test |
gandro
left a comment
There was a problem hiding this comment.
My codeowners look good to me now! Some nits on the docs, but nothing blocking
When an identity is allocated, it needs to be plumbed in to the policy engine. The mechanism for this is the IdentityAllocatorOwner, which pushes down identity adds / deletes. All node-local allocations are synchronously inserted in to the SelectorCache as part of the control flow. Separately, there is a channel of events from the global allocator's backend. As identities are allocated, they are pushed in to the SelectorCache via this mechanism. This is asynchronous, as part of the remote changestream. However, there was an awkward duplication where the local-only allocation path (i.e. an identity for a local CIDR) did *both* the synchronous SelectorCache update *as well* as emitting an event on the event channel. The event channel is purely internal; it is consumed from within the allocator cache and the only effect is to update the SelectorCache; same as the synchronous case. This commit removes this redundant update. This is safe because node-local allocation *always* synchronously updates. Signed-off-by: Casey Callendrello <[email protected]>
Prior to this change, there were two possible ways a newly-allocated identity would be added to the SelectorCache. The first was via the ipcache, which allocates with `notifyOwner: false`, taking the responsibility of propagating updates directly. The IPCache first updates the SelectorCache, then triggers incremental policy map updates on all endpoints. This works well. Alternatively, an identity could be allocated by the global allocator. In this case, `notifyOwner: true` was passed, and the allocator itself took responsibility for updating the SelectorCache. This would regenerate all endpoints, rather than merely applying incremental updates. This also works, but is wasteful. Finally, if a remote node allocates an identity, this will be synchronized via the Allocator backend, sent on a watch channel, inserted in to the SelectorCache, and all endpoints will be regenerated. Regenerating all endpoints is quite wasteful; we have a functional incremental policy update mechanism that exactly handles this case. There were previously issues wherein Envoy would lose updates, but those have been fixed. So, change it so *all* identity updates only trigger incremental updates. Additionally, batch changes so we don't flood endpoint locking. Finally, when an identity is allocated by this node, we will process this new identity twice: once on allocation, and once when the remote backend gets the watch event. Add some de-duplication so we don't needlessly process events. Signed-off-by: Casey Callendrello <[email protected]>
A previous commit stopped regenerating all endpoints on identity allocation, which was wasteful. However, it is still useful to have periodic state reconciliation, in case a configuration edge is somehow missed. So, this adds an explicit controller in the EndpointManager that periodically regenerates all endpoints. By default, the interval is 2 minutes, but it is configurable. Signed-off-by: Casey Callendrello <[email protected]>
These metrics are not interesting: 1. It incremented a metric labeled with "policy update", which is confusing. 2. It ignored the asynchronous nature of endpoint regeneration, so the reported duration is incorrect. Signed-off-by: Casey Callendrello <[email protected]>
The PolicyUpdater is a small trigger that batches requests to regenerate all endpoints. Now that there is a controller for this directly in the EndpointManager, we can remove this additional trigger. Signed-off-by: Casey Callendrello <[email protected]>
The metrics `cilium_triggers_policy_update_total` and `cilium_triggers_policy_update_folds` have been removed. They were never particularly accurate, as this trigger was only used when auxiliary resources such as CiliumEnvoyConfigs were updated, rather than actual policy updates. Furthermore, the measured duration was incorrect as the actual updates were asynchronous. Signed-off-by: Casey Callendrello <[email protected]>
dbb51c7 to
e75284e
Compare
|
/test |
tommyp1ckles
left a comment
There was a problem hiding this comment.
Looks good for @cilium/cli
joestringer
left a comment
There was a problem hiding this comment.
LGTM. I opened some non-blocking threads below but they're more just thinking out loud about some minor tuning that can follow or enquiring for me to learn more 😅
Please review by commit.
Prior to this change, there were two possible ways a newly-allocated
identity would be added to the SelectorCache. The first was via the
ipcache, which allocates with
notifyOwner: false, taking theresponsibility of propagating updates directly.
The IPCache first updates the SelectorCache, then triggers incremental
policy map updates on all endpoints. This works well.
Alternatively, an identity could be allocated by the global allocator.
In this case,
notifyOwner: truewas passed, and the allocator itselftook responsibility for updating the SelectorCache. This would
regenerate all endpoints, rather than merely applying incremental
updates. This also works, but is wasteful.
Finally, if a remote node allocates an identity, this will be
synchronized via the Allocator backend, sent on a watch channel,
inserted in to the SelectorCache, and all endpoints will be regenerated.
Regenerating all endpoints is quite wasteful; we have a functional
incremental policy update mechanism that exactly handles this case.
There were previously issues wherein Envoy would lose updates, but those
have been fixed. So, change it so all identity updates only trigger
incremental updates.
For resilience purposes, we now periodically regenerate all endpoints (default: every 2 minutes). We add a controller to the EndpointManager that handles this. Finally, the redundant PolicyUpdater has lost its trigger, and just delegates to the EndpointManager.