multipool: Fix allocations when deleting or shrinking pools by pippolo84 · Pull Request #42126 · cilium/cilium

pippolo84 · 2025-10-10T20:40:51Z

When a CiliumPodIPPool is shrunk or deleted but its CIDRs are still in use by the nodes (i.e: there are some pods scheduled on those nodes with IPs from the removed CIDRs), we enter a status where the operator is not able to correctly account the allocated CIDRs anymore.
Specifically, it is possible to add again the deleted pool and end up with another node getting assigned the same CIDRs still in use by the initial node and ultimately having the same IPs assigned to multiple pods.

To avoid this scenario, beside emitting a warning when the operator detects that a CIDR still in use has been deleted, we introduce the concept of orphan CIDRs: CIDRs that do not have their parent pool anymore. The operator keeps track of those CIDRs specifically, in order to:

reconcile the internal allocation maps in case the original pool is restored
reject the insertion of new pools (i.e: pools with a different name) that contains the orphan CIDRs
release the orphan CIDRs if the node do not claim that anymore

This is done in order to allow fixing the orphan CIDRs scenario by either:

a) restore the CiliumPodIPPool that should not have been shrunk or deleted initially
b) delete all pods using an IP from a deleted pool

Please refer to the individual commit messages for more information.

Fixes: #41986

Fix operator CIDR allocations in multi-pool IPAM mode after a CiliumPodIPPool update or re-insertion

pippolo84 · 2025-10-11T08:15:53Z

/test

gandro

Awesome work! One thing around operator restart that we need to think about a bit

pkg/ipam/allocator/multipool/pool_allocator.go

gandro

Neat, thanks! One minor piece of feedback

pkg/ipam/allocator/multipool/pool_allocator.go

When one or more CIDRs are removed from a CiliumPodIPPool, the allocator scans the list of allocated CIDRs to emit a warning in case a node is still using one of the removed CIDRs. Moreover, the removed CIDRs are deleted from the operator internal map of allocated CIDRs, to keep its status consistent with the latest snapshot of the CiliumPodIPPool resource. Since the CiliumPodIPPool allows to specify a mask size, the CIDR belonging to the custom resource is usually sliced into smaller CIDRs that are individually allocated to nodes. As an example, consider the following CiliumPodIPPool: ``` apiVersion: cilium.io/v2alpha1 kind: CiliumPodIPPool metadata: name: test-pool spec: ipv4: cidrs: - 10.100.0.0/16 maskSize: 24 ``` A CiliumNode requesting allocations from that pool will end up with the following CIDRs: "10.100.0.0/24", "10.100.1.0/24", 10.100.2.0/24 and so on, depending on how many IPs are requested. The previous logic to check if a removed CIDR is still in use was using the entire CIDR (i.e: "10.100.0.0/16") from the CiliumPodIPPool, instead of checking if the allocated CIDRs (i.e: "10.100.1.0/24", "10.100.2.0/24" and so on) were part of the removed one. The commit fixes the bug and adds a unit test to prove that the CiliumPodIPPool update is now handled correctly. Signed-off-by: Fabio Falzoi <[email protected]>

Use the pool name as a key into the per-node in-use pools map instead of looping over each key to find a match. Signed-off-by: Fabio Falzoi <[email protected]>

Rely on the same code path to both add a new pool and to update an existing one, removing the old function that was taking care of the insertion separately. This commit is not meant to change anything in the pools handling logic, but it allows to apply additional checks while handling a pool upsertion in a single place, being a new pool or an update of an existing one. Signed-off-by: Fabio Falzoi <[email protected]>

Whenever a CIDR is removed from a pool or a pool is removed altogether, there might be nodes that are still using those CIDRs. Previously, the operator would emit a warning, but would not track those CIDRs, leading to issues in case of a subsequent upsertion of the same CIDRs. Specifically, consider the following sequence of events: - CiliumPodIPPool is added with a CIDR "10.10.0.0/16" and a mask size equal to 24 - A CiliumNode requests IPs from that pool, and got assigned the CIDR "10.10.1.0/24" - The CiliumPodIPPool is deleted. The operator emits a warning, but the allocated CIDR is released from its internal allocation maps. Nonetheless, the CIDR is still in use by the node and there might be pods scheduled on it with IPs belonging to that CIDR. - The previously deleted CiliumPodIPPool is added again. No more actions are taken by the operator, that is, the operator thinks "10.10.1.0/24" is free to use. - A new node joins the cluster and requests a CIDR allocation from the same pool. The CIDR "10.10.1.0/24" is assigned to it, ultimately leading to multiple pods on different nodes having the same IPs. To avoid the above scenario we add the concept of "orphan CIDRs": an orphan CIDR is a CIDR still allocated to the node but not listed in any pools anymore, either because the parent CiliumPodIPPool has been removed or shrunk. When the operator detects that a CIDR has been orphaned, it stores it in a separate map. The map will be checked at each subsequent CiliumPodIPPool upsert event: if the original pool containing the orphan CIDR is restored, the CIDR is marked as allocated to reconcile the allocations status in the operator internal maps with the effective one. Fixes: cilium#41986 Signed-off-by: Fabio Falzoi <[email protected]>

Since orphan CIDRs are stored in a map, we lose track of them after an operator restart. Therefore, at startup, after the allocator received all the CiliumPodIPPool resources from k8s, it will mark as orphan all the CIDRs that are "allocated" in the CiliumNodes. This allows to reconstruct the status of the orphan CIDRs after the restart, even if a CiliumPodIPPool was deleted but its CIDRs were still in use by nodes. Signed-off-by: Fabio Falzoi <[email protected]>

Add a test to verify that a newly added pool with a name different than the one originating the orphan CIDRs cannot "steal" those CIDRs. Since we always keep orphan CIDRs untouched in the spec.IPAM.Pools.Allocated section of the CiliumNode, we cannot reassign those CIDRs to other pools, otherwise the allocations reported in the CiliumNode would be out of date. Therefore, in order to keep the CiliumNode always consistent, we return an error when upserting a new pool. To unorphan a CIDR, the original pool must be restored first. Signed-off-by: Fabio Falzoi <[email protected]>

When a pool is updated, we have to loop over all the old CIDRs to remove the ones that are not part of the latest version of the pool. However, we should not immediately delete the removed CIDRs while looping on them, because doing that interfere with the loop itself, leading to incorrect results. Instead we should mark each stale CIDR and finally remove them from the slice later, once the loop is finished. This can be proven by running the new unit test, that was failing before this change: === RUN TestUpdatePoolKeepOldCIDRs pool_allocator_test.go:1082: Error Trace: github.com/cilium/cilium/pkg/ipam/allocator/multipool/pool_allocator_test.go:1082 Error: Should be false Test: TestUpdatePoolKeepOldCIDRs --- FAIL: TestUpdatePoolKeepOldCIDRs (0.00s) Signed-off-by: Fabio Falzoi <[email protected]>

pippolo84 · 2025-10-15T17:49:09Z

@gandro I kept the release of the orphan CIDRs when they are not used by nodes anymore in a separate commit. Same for the test that assert that "stealing" orphan CIDRs is not possible (see this commit). Hope that will ease the further round of review a bit.

Also, while working on the above I've discovered another bug and fixed it, see this commit.

I understand the PR is becoming pretty convoluted, if you prefer I can try to split it, let me know.

pippolo84 · 2025-10-15T18:08:10Z

/test

pippolo84 · 2025-10-16T07:28:54Z

ci-integration failure seems legit, marking this back to draft while I investigate.

gandro

Awesome work!

When a node does not claim an orphan CIDR anymore (i.e: when all pods scheduled on the node with an IP from an orphan CIDR are deleted) release the orphan CIDR in the pool allocator just like we do for non-orphan CIDRs. Signed-off-by: Fabio Falzoi <[email protected]>

Whenever a pool is updated, we have to reconcile the status of the pool allocator accordingly, for both v4 and v6 allocators. Simplify and improve the readability of the update logic. Signed-off-by: Fabio Falzoi <[email protected]>

pippolo84 · 2025-10-16T08:41:37Z

ci-integration failure seems legit, marking this back to draft while I investigate.

It was just a matter of using the correct c *assert.Collect in the condition passed to assert.EventuallyWithT, fixed.

@gandro also added a very minor cleanup to DRY updateCIDRSets a bit, see ca8bb6d.

pippolo84 · 2025-10-16T08:41:55Z

/test

pippolo84 added release-note/bug This PR fixes an issue in a previous release of Cilium. area/ipam IP address management, including cloud IPAM area/multipool Affects Multi-Pool IPAM labels Oct 10, 2025

pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch from 30d095c to b823529 Compare October 10, 2025 20:43

pippolo84 marked this pull request as ready for review October 10, 2025 21:02

pippolo84 requested a review from a team as a code owner October 10, 2025 21:02

pippolo84 requested a review from gandro October 10, 2025 21:02

This comment was marked as outdated.

Sign in to view

pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch 3 times, most recently from ced20c8 to 0985735 Compare October 11, 2025 08:13

This comment was marked as outdated.

Sign in to view

pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch from 0985735 to eb45a53 Compare October 11, 2025 08:15

gandro reviewed Oct 13, 2025

View reviewed changes

pkg/ipam/allocator/multipool/pool_allocator.go Outdated Show resolved Hide resolved

pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch 2 times, most recently from 36cbc58 to 0478ca5 Compare October 13, 2025 21:16

gandro reviewed Oct 14, 2025

View reviewed changes

pkg/ipam/allocator/multipool/pool_allocator.go Outdated Show resolved Hide resolved

pkg/ipam/allocator/multipool/pool_allocator.go Show resolved Hide resolved

gandro reviewed Oct 14, 2025

View reviewed changes

pkg/ipam/allocator/multipool/pool_allocator.go Show resolved Hide resolved

pippolo84 mentioned this pull request Oct 15, 2025

ipam/multipool: Promote the feature to stable #42191

Merged

pippolo84 added 7 commits October 15, 2025 18:38

ipam/multipool: Avoid looping to test pools map

b646c32

Use the pool name as a key into the per-node in-use pools map instead of looping over each key to find a match. Signed-off-by: Fabio Falzoi <[email protected]>

pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch from 0478ca5 to 2fc8c99 Compare October 15, 2025 17:45

pippolo84 marked this pull request as draft October 16, 2025 07:29

gandro approved these changes Oct 16, 2025

View reviewed changes

pippolo84 added 2 commits October 16, 2025 10:23

pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch from 2fc8c99 to ca8bb6d Compare October 16, 2025 08:38

pippolo84 marked this pull request as ready for review October 16, 2025 08:41

gandro approved these changes Oct 16, 2025

View reviewed changes

pippolo84 enabled auto-merge October 16, 2025 09:40

pippolo84 added this pull request to the merge queue Oct 16, 2025

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 16, 2025

Merged via the queue into cilium:main with commit 2a1b7c7 Oct 16, 2025
73 checks passed

pippolo84 deleted the pr/pippolo84/multi-pool-recreate-pool-fix branch October 16, 2025 10:14

pippolo84 added the affects/v1.18 This issue affects v1.18 branch label Oct 20, 2025

cilium-release-bot bot added this to cilium v1.19.0 Feb 3, 2026

cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multipool: Fix allocations when deleting or shrinking pools#42126

multipool: Fix allocations when deleting or shrinking pools#42126
pippolo84 merged 9 commits intocilium:mainfrom
pippolo84:pr/pippolo84/multi-pool-recreate-pool-fix

pippolo84 commented Oct 10, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

pippolo84 commented Oct 11, 2025

Uh oh!

gandro left a comment

Uh oh!

Uh oh!

gandro left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pippolo84 commented Oct 15, 2025 •

edited

Loading

Uh oh!

pippolo84 commented Oct 15, 2025

Uh oh!

pippolo84 commented Oct 16, 2025

Uh oh!

gandro left a comment

Uh oh!

pippolo84 commented Oct 16, 2025

Uh oh!

pippolo84 commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pippolo84 commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

pippolo84 commented Oct 11, 2025

Uh oh!

gandro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gandro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pippolo84 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pippolo84 commented Oct 15, 2025

Uh oh!

pippolo84 commented Oct 16, 2025

Uh oh!

gandro left a comment

Choose a reason for hiding this comment

Uh oh!

pippolo84 commented Oct 16, 2025

Uh oh!

pippolo84 commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pippolo84 commented Oct 10, 2025 •

edited

Loading

pippolo84 commented Oct 15, 2025 •

edited

Loading