Skip to content

multipool: Fix allocations when deleting or shrinking pools#42126

Merged
pippolo84 merged 9 commits intocilium:mainfrom
pippolo84:pr/pippolo84/multi-pool-recreate-pool-fix
Oct 16, 2025
Merged

multipool: Fix allocations when deleting or shrinking pools#42126
pippolo84 merged 9 commits intocilium:mainfrom
pippolo84:pr/pippolo84/multi-pool-recreate-pool-fix

Conversation

@pippolo84
Copy link
Copy Markdown
Member

@pippolo84 pippolo84 commented Oct 10, 2025

When a CiliumPodIPPool is shrunk or deleted but its CIDRs are still in use by the nodes (i.e: there are some pods scheduled on those nodes with IPs from the removed CIDRs), we enter a status where the operator is not able to correctly account the allocated CIDRs anymore.
Specifically, it is possible to add again the deleted pool and end up with another node getting assigned the same CIDRs still in use by the initial node and ultimately having the same IPs assigned to multiple pods.

To avoid this scenario, beside emitting a warning when the operator detects that a CIDR still in use has been deleted, we introduce the concept of orphan CIDRs: CIDRs that do not have their parent pool anymore. The operator keeps track of those CIDRs specifically, in order to:

  1. reconcile the internal allocation maps in case the original pool is restored
  2. reject the insertion of new pools (i.e: pools with a different name) that contains the orphan CIDRs
  3. release the orphan CIDRs if the node do not claim that anymore

This is done in order to allow fixing the orphan CIDRs scenario by either:

a) restore the CiliumPodIPPool that should not have been shrunk or deleted initially
b) delete all pods using an IP from a deleted pool

Please refer to the individual commit messages for more information.

Fixes: #41986

Fix operator CIDR allocations in multi-pool IPAM mode after a CiliumPodIPPool update or re-insertion

@pippolo84 pippolo84 added release-note/bug This PR fixes an issue in a previous release of Cilium. area/ipam IP address management, including cloud IPAM area/multipool Affects Multi-Pool IPAM labels Oct 10, 2025
@pippolo84 pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch from 30d095c to b823529 Compare October 10, 2025 20:43
@pippolo84 pippolo84 marked this pull request as ready for review October 10, 2025 21:02
@pippolo84 pippolo84 requested a review from a team as a code owner October 10, 2025 21:02
@pippolo84 pippolo84 requested a review from gandro October 10, 2025 21:02
@pippolo84

This comment was marked as outdated.

@pippolo84 pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch 3 times, most recently from ced20c8 to 0985735 Compare October 11, 2025 08:13
@pippolo84

This comment was marked as outdated.

@pippolo84 pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch from 0985735 to eb45a53 Compare October 11, 2025 08:15
@pippolo84
Copy link
Copy Markdown
Member Author

/test

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! One thing around operator restart that we need to think about a bit

@pippolo84 pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch 2 times, most recently from 36cbc58 to 0478ca5 Compare October 13, 2025 21:16
Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, thanks! One minor piece of feedback

When one or more CIDRs are removed from a CiliumPodIPPool, the allocator
scans the list of allocated CIDRs to emit a warning in case a node is
still using one of the removed CIDRs. Moreover, the removed CIDRs are
deleted from the operator internal map of allocated CIDRs, to keep its
status consistent with the latest snapshot of the CiliumPodIPPool
resource.

Since the CiliumPodIPPool allows to specify a mask size, the CIDR
belonging to the custom resource is usually sliced into smaller CIDRs
that are individually allocated to nodes.

As an example, consider the following CiliumPodIPPool:

```
apiVersion: cilium.io/v2alpha1
kind: CiliumPodIPPool
metadata:
  name: test-pool
spec:
  ipv4:
    cidrs:
    - 10.100.0.0/16
    maskSize: 24
```

A CiliumNode requesting allocations from that pool will end up with the
following CIDRs: "10.100.0.0/24", "10.100.1.0/24", 10.100.2.0/24 and so
on, depending on how many IPs are requested.

The previous logic to check if a removed CIDR is still in use was using
the entire CIDR (i.e: "10.100.0.0/16") from the CiliumPodIPPool, instead
of checking if the allocated CIDRs (i.e: "10.100.1.0/24",
"10.100.2.0/24" and so on) were part of the removed one.

The commit fixes the bug and adds a unit test to prove that the
CiliumPodIPPool update is now handled correctly.

Signed-off-by: Fabio Falzoi <[email protected]>
Use the pool name as a key into the per-node in-use pools map instead of
looping over each key to find a match.

Signed-off-by: Fabio Falzoi <[email protected]>
Rely on the same code path to both add a new pool and to update an
existing one, removing the old function that was taking care of the
insertion separately.

This commit is not meant to change anything in the pools handling logic,
but it allows to apply additional checks while handling a pool upsertion
in a single place, being a new pool or an update of an existing one.

Signed-off-by: Fabio Falzoi <[email protected]>
Whenever a CIDR is removed from a pool or a pool is removed altogether,
there might be nodes that are still using those CIDRs. Previously, the
operator would emit a warning, but would not track those CIDRs, leading
to issues in case of a subsequent upsertion of the same CIDRs.

Specifically, consider the following sequence of events:

- CiliumPodIPPool is added with a CIDR "10.10.0.0/16" and a mask size
  equal to 24
- A CiliumNode requests IPs from that pool, and got assigned the CIDR
  "10.10.1.0/24"
- The CiliumPodIPPool is deleted. The operator emits a warning, but the
  allocated CIDR is released from its internal allocation maps.
  Nonetheless, the CIDR is still in use by the node and there might be
  pods scheduled on it with IPs belonging to that CIDR.
- The previously deleted CiliumPodIPPool is added again. No more actions
  are taken by the operator, that is, the operator thinks "10.10.1.0/24"
  is free to use.
- A new node joins the cluster and requests a CIDR allocation from the
  same pool. The CIDR "10.10.1.0/24" is assigned to it, ultimately
  leading to multiple pods on different nodes having the same IPs.

To avoid the above scenario we add the concept of "orphan CIDRs": an
orphan CIDR is a CIDR still allocated to the node but not listed in any
pools anymore, either because the parent CiliumPodIPPool has been
removed or shrunk.

When the operator detects that a CIDR has been orphaned, it stores it in
a separate map. The map will be checked at each subsequent
CiliumPodIPPool upsert event: if the original pool containing the orphan
CIDR is restored, the CIDR is marked as allocated to reconcile the
allocations status in the operator internal maps with the effective one.

Fixes: cilium#41986

Signed-off-by: Fabio Falzoi <[email protected]>
Since orphan CIDRs are stored in a map, we lose track of them after an
operator restart. Therefore, at startup, after the allocator received
all the CiliumPodIPPool resources from k8s, it will mark as orphan all
the CIDRs that are "allocated" in the CiliumNodes.

This allows to reconstruct the status of the orphan CIDRs after the
restart, even if a CiliumPodIPPool was deleted but its CIDRs were still
in use by nodes.

Signed-off-by: Fabio Falzoi <[email protected]>
Add a test to verify that a newly added pool with a name different than
the one originating the orphan CIDRs cannot "steal" those CIDRs.

Since we always keep orphan CIDRs untouched in the
spec.IPAM.Pools.Allocated section of the CiliumNode, we cannot reassign
those CIDRs to other pools, otherwise the allocations reported in the
CiliumNode would be out of date.  Therefore, in order to keep the
CiliumNode always consistent, we return an error when upserting a new
pool. To unorphan a CIDR, the original pool must be restored first.

Signed-off-by: Fabio Falzoi <[email protected]>
When a pool is updated, we have to loop over all the old CIDRs to remove
the ones that are not part of the latest version of the pool.

However, we should not immediately delete the removed CIDRs while
looping on them, because doing that interfere with the loop itself,
leading to incorrect results.  Instead we should mark each stale CIDR
and finally remove them from the slice later, once the loop is finished.

This can be proven by running the new unit test, that was failing before
this change:

=== RUN   TestUpdatePoolKeepOldCIDRs
    pool_allocator_test.go:1082:
        	Error Trace:	github.com/cilium/cilium/pkg/ipam/allocator/multipool/pool_allocator_test.go:1082
        	Error:      	Should be false
        	Test:       	TestUpdatePoolKeepOldCIDRs
--- FAIL: TestUpdatePoolKeepOldCIDRs (0.00s)

Signed-off-by: Fabio Falzoi <[email protected]>
@pippolo84 pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch from 0478ca5 to 2fc8c99 Compare October 15, 2025 17:45
@pippolo84
Copy link
Copy Markdown
Member Author

pippolo84 commented Oct 15, 2025

@gandro I kept the release of the orphan CIDRs when they are not used by nodes anymore in a separate commit. Same for the test that assert that "stealing" orphan CIDRs is not possible (see this commit). Hope that will ease the further round of review a bit.

Also, while working on the above I've discovered another bug and fixed it, see this commit.

I understand the PR is becoming pretty convoluted, if you prefer I can try to split it, let me know.

@pippolo84
Copy link
Copy Markdown
Member Author

/test

@pippolo84
Copy link
Copy Markdown
Member Author

ci-integration failure seems legit, marking this back to draft while I investigate.

@pippolo84 pippolo84 marked this pull request as draft October 16, 2025 07:29
Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

When a node does not claim an orphan CIDR anymore (i.e: when all pods
scheduled on the node with an IP from an orphan CIDR are deleted)
release the orphan CIDR in the pool allocator just like we do for
non-orphan CIDRs.

Signed-off-by: Fabio Falzoi <[email protected]>
Whenever a pool is updated, we have to reconcile the status of the pool
allocator accordingly, for both v4 and v6 allocators. Simplify and
improve the readability of the update logic.

Signed-off-by: Fabio Falzoi <[email protected]>
@pippolo84 pippolo84 force-pushed the pr/pippolo84/multi-pool-recreate-pool-fix branch from 2fc8c99 to ca8bb6d Compare October 16, 2025 08:38
@pippolo84
Copy link
Copy Markdown
Member Author

ci-integration failure seems legit, marking this back to draft while I investigate.

It was just a matter of using the correct c *assert.Collect in the condition passed to assert.EventuallyWithT, fixed.

@gandro also added a very minor cleanup to DRY updateCIDRSets a bit, see ca8bb6d.

@pippolo84 pippolo84 marked this pull request as ready for review October 16, 2025 08:41
@pippolo84
Copy link
Copy Markdown
Member Author

/test

@pippolo84 pippolo84 enabled auto-merge October 16, 2025 09:40
@pippolo84 pippolo84 added this pull request to the merge queue Oct 16, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 16, 2025
Merged via the queue into cilium:main with commit 2a1b7c7 Oct 16, 2025
73 checks passed
@pippolo84 pippolo84 deleted the pr/pippolo84/multi-pool-recreate-pool-fix branch October 16, 2025 10:14
@pippolo84 pippolo84 added the affects/v1.18 This issue affects v1.18 branch label Oct 20, 2025
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects/v1.18 This issue affects v1.18 branch area/ipam IP address management, including cloud IPAM area/multipool Affects Multi-Pool IPAM ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

Issue when extending CiliumPodIPPool with new CIDRs

3 participants