Skip to content

tests: remove identity manager from ignored error messages#42982

Merged
pchaigno merged 1 commit intocilium:mainfrom
odinuge:odinuge/remove-bad-log
Dec 18, 2025
Merged

tests: remove identity manager from ignored error messages#42982
pchaigno merged 1 commit intocilium:mainfrom
odinuge:odinuge/remove-bad-log

Conversation

@odinuge
Copy link
Copy Markdown
Member

@odinuge odinuge commented Nov 25, 2025

This removes a logline in the list of ignored errors. We want to catch this since it points directly to correctness issues. When this occurs, its an issue with the reference counting of identity use - and either there is a leak, or some things are detached prematurely that can cause policy issues.

Fixes: #16419

@maintainer-s-little-helper
Copy link
Copy Markdown

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Nov 25, 2025
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Nov 25, 2025

/test

@github-actions github-actions bot added the cilium-cli This PR contains changes related with cilium-cli label Nov 25, 2025
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Nov 25, 2025

/ci-integration

1 similar comment
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Nov 25, 2025

/ci-integration

Copy link
Copy Markdown
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I added to the PR desc. that this PR fixes #16419.

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Nov 27, 2025

/test

@odinuge odinuge force-pushed the odinuge/remove-bad-log branch from f07293f to 77582d4 Compare November 27, 2025 13:18
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Nov 27, 2025
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Nov 27, 2025

/test

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Nov 27, 2025

Wow, this triggered in the downgrade to v1.18.4 test (here)!

📋 Test Report [cilium-test-1]
❌ 1/8 tests failed (1/168 actions), 118 tests skipped, 3 scenarios skipped:
Test [check-log-errors]:
  🟥 check-log-errors/no-errors-in-logs:pkg/identity/identitymanager:kind-cluster1/kube-system/cilium-wjt5v (cilium-agent): Found 1 logs in kind-cluster1/kube-system/cilium-wjt5v (cilium-agent) matching list of errors that must be investigated:
time=2025-11-27T13:42:10.63656636Z level=error source=/go/src/github.com/cilium/cilium/pkg/identity/identitymanager/manager.go:145 msg="removing identity not added to the identity manager!" module=agent.controlplane.identity.identity-manager identity=80048 (1 occurrences)

I'm pretty sure this is triggering the issue fixed in #42662 and #42661!

I guess we then either wait for a new release to merge this PR, or we start backporting the fixes.

cc @pchaigno for opening the original issue here.

@christarazi
Copy link
Copy Markdown
Member

@odinuge Nice! I've marked the mentioned PRs for backport to v1.18 since that still falls under the backporting criteria.

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 10, 2025

Both PRs are now backported. Once the new v1.18.5 release is built, I'll rebase this and mark ready-for-review.

@pchaigno
Copy link
Copy Markdown
Member

Why do we need to wait for a v1.18 release? AFAIK, the up/downgrade tests always test the latest branches (v1.18 <> main), not the releases.

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 11, 2025

Ahh, interesting. I based it off the loglines stating;

ℹ️  Cilium version: 1.18.4
🏃[cilium-test-1] Running 126 tests ...

@pchaigno
Copy link
Copy Markdown
Member

I believe that's just the CLI trying to detect the last version to know what tests can be executed.

@odinuge odinuge force-pushed the odinuge/remove-bad-log branch from 77582d4 to 03925db Compare December 11, 2025 15:59
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 11, 2025

Ahh, interesting! I've rebased and fixed the conflict now, so lets see!

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 11, 2025

/test

1 similar comment
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 11, 2025

/test

@odinuge odinuge force-pushed the odinuge/remove-bad-log branch from 03925db to 26a5df9 Compare December 11, 2025 17:55
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 16, 2025

/test

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 16, 2025

/ci-ipsec-e2e

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 16, 2025

/ci-gateway-api

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 16, 2025

/ci-ginkgo

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 16, 2025

^ all looked like other flakes, so I'll try rerun

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 16, 2025

/ci-clustermesh

@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 17, 2025

Looks like @pchaigno is correct that the tests runs the agent code from the latest commit on each release branch - so this looks g2g now.

@pchaigno pchaigno added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Dec 18, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 18, 2025
@pchaigno pchaigno removed the request for review from YutaroHayakawa December 18, 2025 04:22
@pchaigno pchaigno added this pull request to the merge queue Dec 18, 2025
@pchaigno
Copy link
Copy Markdown
Member

Thanks!

Merged via the queue into cilium:main with commit bc86dba Dec 18, 2025
98 of 112 checks passed
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Dec 18, 2025
zocimek pushed a commit to zocimek/home-ops that referenced this pull request Feb 1, 2026
…0 ) (#584)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[aqua:cilium/cilium-cli](https://redirect.github.com/cilium/cilium-cli)
| minor | `0.18.9` → `0.19.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>cilium/cilium-cli (aqua:cilium/cilium-cli)</summary>

###
[`v0.19.0`](https://redirect.github.com/cilium/cilium-cli/releases/tag/v0.19.0)

[Compare
Source](https://redirect.github.com/cilium/cilium-cli/compare/v0.18.9...v0.19.0)

## Summary of Changes

**CI Changes:**

- Add L7 policy traffic disruption tests
([cilium/cilium#42150](https://redirect.github.com/cilium/cilium/issues/42150),
[@&#8203;fristonio](https://redirect.github.com/fristonio))
- Cilium-cli SNI connectivity tests now retry expected successful
operations to recover from failures due to external upstream issues.
([cilium/cilium#42980](https://redirect.github.com/cilium/cilium/issues/42980),
[@&#8203;jrajahalme](https://redirect.github.com/jrajahalme))
- cli: connectivity: fix typo in L7 LB tests
([cilium/cilium#43610](https://redirect.github.com/cilium/cilium/issues/43610),
[@&#8203;julianwiedmann](https://redirect.github.com/julianwiedmann))
- Fix intermittent NodePort connectivity test timeouts in dual-stack
clusters by validating NodePort readiness on all node IP addresses
during test setup.
([cilium/cilium#40812](https://redirect.github.com/cilium/cilium/issues/40812),
[@&#8203;pillai-ashwin](https://redirect.github.com/pillai-ashwin))
- tests: remove identity manager from ignored error messages
([cilium/cilium#42982](https://redirect.github.com/cilium/cilium/issues/42982),
[@&#8203;odinuge](https://redirect.github.com/odinuge))

**Misc Changes:**

- chore(deps): update all-dependencies (main)
([cilium/cilium#43169](https://redirect.github.com/cilium/cilium/issues/43169),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update all-dependencies (main)
([cilium/cilium#43456](https://redirect.github.com/cilium/cilium/issues/43456),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update all-dependencies (main)
([cilium/cilium#43508](https://redirect.github.com/cilium/cilium/issues/43508),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update base-images (main)
([cilium/cilium#43457](https://redirect.github.com/cilium/cilium/issues/43457),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update base-images (main)
([cilium/cilium#43538](https://redirect.github.com/cilium/cilium/issues/43538),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update docker.io/library/golang:1.25.5 docker digest to
[`a22b2e6`](https://redirect.github.com/cilium/cilium-cli/commit/a22b2e6)
(main)
([cilium/cilium#43303](https://redirect.github.com/cilium/cilium/issues/43303),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update go to v1.25.5 (main)
([cilium/cilium#43173](https://redirect.github.com/cilium/cilium/issues/43173),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- cilium-cli/connectivity: remove matcher for bpf/init.sh errors
([cilium/cilium#43109](https://redirect.github.com/cilium/cilium/issues/43109),
[@&#8203;tklauser](https://redirect.github.com/tklauser))
- cilium-cli: convert net.IP to netip.Addr
([cilium/cilium#42371](https://redirect.github.com/cilium/cilium/issues/42371),
[@&#8203;phuhung273](https://redirect.github.com/phuhung273))
- cli: Update `network-perf` image ref
([cilium/cilium#43297](https://redirect.github.com/cilium/cilium/issues/43297),
[@&#8203;HadrienPatte](https://redirect.github.com/HadrienPatte))
- chore(deps): update golangci/golangci-lint-action action to v9.2.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3148](https://redirect.github.com/cilium/cilium-cli/pull/3148)
- Update stable release to v0.18.9 by
[@&#8203;michi-covalent](https://redirect.github.com/michi-covalent) in
[#&#8203;3149](https://redirect.github.com/cilium/cilium-cli/pull/3149)
- chore(deps): update golangci/golangci-lint docker tag to v2.7.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3151](https://redirect.github.com/cilium/cilium-cli/pull/3151)
- chore(deps): update go to v1.25.5 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3153](https://redirect.github.com/cilium/cilium-cli/pull/3153)
- ci: clean up disk space in release workflow by
[@&#8203;tklauser](https://redirect.github.com/tklauser) in
[#&#8203;3154](https://redirect.github.com/cilium/cilium-cli/pull/3154)
- chore(deps): update actions/stale action to v10.1.1 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3150](https://redirect.github.com/cilium/cilium-cli/pull/3150)
- chore(deps): update gcr.io/distroless/static:latest docker digest to
[`4b2a093`](https://redirect.github.com/cilium/cilium-cli/commit/4b2a093)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3152](https://redirect.github.com/cilium/cilium-cli/pull/3152)
- chore(deps): update golangci/golangci-lint docker tag to v2.7.2 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3155](https://redirect.github.com/cilium/cilium-cli/pull/3155)
- chore(deps): update docker.io/library/golang:1.25.5 docker digest to
[`a22b2e6`](https://redirect.github.com/cilium/cilium-cli/commit/a22b2e6)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3156](https://redirect.github.com/cilium/cilium-cli/pull/3156)
- chore(deps): update actions/upload-artifact action to v6 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3157](https://redirect.github.com/cilium/cilium-cli/pull/3157)
- chore(deps): update docker.io/library/golang:1.25.5 docker digest to
[`36b4f45`](https://redirect.github.com/cilium/cilium-cli/commit/36b4f45)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3160](https://redirect.github.com/cilium/cilium-cli/pull/3160)
- chore(deps): update dependency cilium/cilium to v1.18.5 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3159](https://redirect.github.com/cilium/cilium-cli/pull/3159)
- chore(deps): update dependency kubernetes-sigs/kind to v0.31.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3158](https://redirect.github.com/cilium/cilium-cli/pull/3158)
- chore(deps): update docker/setup-buildx-action action to v3.12.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3162](https://redirect.github.com/cilium/cilium-cli/pull/3162)
- chore(deps): update golangci/golangci-lint docker tag to v2.8.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3163](https://redirect.github.com/cilium/cilium-cli/pull/3163)
- chore(deps): update docker.io/library/golang:1.25.5 docker digest to
[`6cc2338`](https://redirect.github.com/cilium/cilium-cli/commit/6cc2338)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3164](https://redirect.github.com/cilium/cilium-cli/pull/3164)
- chore(deps): update gcr.io/distroless/static:latest docker digest to
[`cd64bec`](https://redirect.github.com/cilium/cilium-cli/commit/cd64bec)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3165](https://redirect.github.com/cilium/cilium-cli/pull/3165)
- chore(deps): update actions/setup-go action to v6.2.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3166](https://redirect.github.com/cilium/cilium-cli/pull/3166)
- Prepare for v0.19.0 release by
[@&#8203;tklauser](https://redirect.github.com/tklauser) in
[#&#8203;3167](https://redirect.github.com/cilium/cilium-cli/pull/3167)

**Full Changelog**:
<cilium/cilium-cli@v0.18.9...v0.19.0>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://redirect.github.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi44MS4yIiwidXBkYXRlZEluVmVyIjoiNDIuODEuMyIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9taW5vciJdfQ==-->

Co-authored-by: zocimek-renovate[bot] <134739422+zocimek-renovate[bot]@users.noreply.github.com>
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
christarazi added a commit to christarazi/cilium that referenced this pull request Feb 13, 2026
Due to cilium#42661 and
cilium#42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
cilium#42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>
christarazi added a commit to christarazi/cilium that referenced this pull request Feb 13, 2026
Due to cilium#42661 and
cilium#42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
cilium#42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Feb 13, 2026
Due to #42661 and
#42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
#42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>
jiashengz pushed a commit to Roblox/cilium that referenced this pull request Feb 23, 2026
* ci: Extend test timeout for ci-verifier

Verifier tests occasionally take a bit over 20m, so extend
the timeout to 25m.

Signed-off-by: Rastislav Szabo <[email protected]>

* ci: e2e: enhance readability of workflow job name

This commit updates the name of the "Setup & Test" job in the
GitHub Actions workflow for e2e upgrade tests to include only the matrix
parameters "name" and "mode". This change improves the readability
of the workflow runs by providing more context about the specific
configuration being tested.

Prior to this, the name of each job contained the whole matrix combination,
which in the UI resulted to be cut off and not readable. Given that now
we use the same workflow file for running both `minor` and `patch` upgrades,
let's make the displayed name simpler.

The result will be `Setup & Test (ipsec-1, minor)`.

Signed-off-by: Simone Magnani <[email protected]>

* ci: e2e: log matrix configuration in each job

This commits adds as a first step of the `Setup & Test` job for e2e-upgrade
a simple step to dump the current matrix configuration being tested.

The previous commit, modified the title to simply display the matrix entry
name and mode (e.g., `Setup & Test (ipsec-1, minor)`) rather than the
whole configuration. In UI, that would result to be truncated anyway.

It is true that, given the matrix.name (e.g., ipsec-1), a user can open the
specific file and lookup the configuration required, but I think that
having a step where we dump it would speed up and easy debuggability in CI.

The output would be similar to:

```
> Log Matrix Configuration
Current matrix configuration:
{
  "name": "wireguard-1",
  "kernel": "5.10",
  "kube-proxy": "iptables",
  "kpr": "true",
  "devices": "{eth0,eth1}",
  "secondary-network": "true",
  "tunnel": "vxlan",
  "encryption": "wireguard",
  "encryption-node": "false",
  "lb-mode": "snat",
  "endpoint-routes": "true",
  "egress-gateway": "true",
  "ingress-controller": "true",
  "mode": "minor"
}
```

Signed-off-by: Simone Magnani <[email protected]>

* endpoint: Log labels as structured JSON objects

Standardize logging in pkg/endpoint so that identityLabels and related fields are logged as structured JSON objects instead of comma-separated strings by explicitly casting labels.Labels to map[string]labels.Label.

Signed-off-by: Jie WU <[email protected]>

```release-note
endpoint: Log labels as structured JSON objects
```

* feat(helm): hubble-ui containers set to pss-restricted

This sets the hubble-ui pods/containers to match k8s
pss-restricted profile along with the optional
`readOnlyRootFilesystem: true`

Signed-off-by: Pat Riehecky <[email protected]>

* helm: allow multicluster-services installCRDs to update CRDs

Previously cilium-operator fails to start if MCS/installCRDs is enabled
because it does not have permissions to update the CRD with this log
message:

level=error msg="Unable to update CRD"
module=operator.operator-controlplane.leader-lifecycle.create-crds
name=serviceimports.multicluster.x-k8s.io
error="customresourcedefinitions.apiextensions.k8s.io
\"serviceimports.multicluster.x-k8s.io\" is forbidden: User
\"system:serviceaccount:kube-system:cilium-operator\" cannot update
resource \"customresourcedefinitions\" in API group
\"apiextensions.k8s.io\" at the cluster scope"

This patch adds the necessary permissions to cilium-operator if you have
mcs/installCRDs enabled

Fixes: #44210
Fixes: 3874013329d0 ("clustermesh: add config for auto installing
MCS-API CRDs")

Signed-off-by: Florian Ströger <[email protected]>

* policy: (mechanical) refactor out flow lookup types

A subsequent commit will include an alternate policy iteration system,
so it will be nice to move the types to policy/types.

This also removes the now-useless Decision type, as it's not used
anywhere in the codebase.

Signed-off-by: Casey Callendrello <[email protected]>

* policy: add a simple iterative policy simulator

This is a simple userspace tool that executes rules step-by-step. It's
purpose will be to validate more complex policy scenarios, ideally by
fuzzing.

To ensure it's output matches that of the existing policy engine, it
matches the LookupFlow method signature, and existing tests validate
that the simulation engine returns the same verdict.

Signed-off-by: Casey Callendrello <[email protected]>

* Add fuzz-based policy testing

This generates random policy corpuses and compares MapState-based policy
calculation with the iterative simulator.

Signed-off-by: Casey Callendrello <[email protected]>

* policy: Fix fuzz testing

Avoid using *testing.F for the logger as then any log within the fuzz
test would fail.

Fix the order of expected and actual for require.Equal.

Add more debugging.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Hide precedence details better

Hide precedence details from the policymap package.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add optional indexing by identity

Add optional mapState indexing by identity to support incremental removal
of generated keys. This is only needed for deletion pass entries, so the
index is only used if the policy has pass verdicts.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add default deny rule when pass verdict is used

Proper processing of pass verdicts requires the default deny rule to be
explicitly added to the mapstate so that it can be seen by pass verdict
entries.

The default rule is added to the next tier if any non-default tiers or
priorities are in use, of if the traffic direction has any pass
rules. This way the pass rule can pass to the added default deny (or
allow) rule.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Fix L7Filter precedence handling for pass verdicts

Deny takes precedence over allow and pass, allow takes precedence over
pass. Define new HasPrecedenceOver() to handle this instead of using just
IsDeny() like before.  Would be simpler if Allow was not the zero value,
but changing that would require changing all unit testing code that uses
it as the default.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Fix per-tier priority range allocation

Fix tier base priority calculation. When figuring out the priority range
for each tier, the full range of the remaining tiers must be included to
add enough space for pass verdicts on higher tiers. Then, when setting
the base priotity of each tier, this has to be reversed.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add Fuzz cases

Commit fuzzer cases found during development.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add generated L3/4 entries, multipass support

A pass of a specific identity to a lower tier rule with wildcard identity
should pass the given identity only and keep the wildcard entry at the
original precedence to take care of traffic with other identities. Since
the original entry needs to be kept, a new generated entry with the
identity from the pass entry and the L4 from the passed to entry must be
added.

We missed this case earlier due to BroaderOrEqualKeys only iterating
wildcard identity entries when the new key is a wildcard entry. Entries
that have a broader or equal L4 but more specific L3 are not as a whole
"broader or equal". To handle the need for generated entries for the pass
verdict processing "BroaderOrEqualKeys" is changed to also iterate all
specific L3 keys if the L4 is broader or equal and the given key has the
wildcard identity. The old behavior is retained with
CoveringBroaderOrEqualKeys(). Similarly, NarrowerOrEqualKeys() is renamed
as CoveringNarrowerOrEqualKeys() while NarrowerOrEqualKeys() now also
iterates keys with the wildcard identity when the given key has a
specific identity.

The addition of generated entries requires these entries to be deleted
when that identity is incrementally deleted. Since selector cache is
transactional we can delete all keys with the deleted identity, when the
first key with that identity is deleted. To make this efficient we use
the new id index.

To add support for pass verdicts at multiple tiers, the pass metadata is
now stored as a slice. Overhead to non-pass entries is reduced by storing
the slice via a pointer ('passes'), as most mapStateEntries would not
have any pass metadata.

If 'passes' is non-nil, then the pointed-to slice must have at least
one element, and all elements must have non-zero 'passPrecedence'.

When merging pass metadata we clone the slice to be mutated so that the
same slice can safely be used in multiple entries.

Split insertWithPasses() from insertWithChanges(); insertWithPasses() is
only calling it if the policy has any pass verdicts. This reduces the
chance of regressions for non-pass policies.

Log a warning if a policy with pass verdicts is also using auth
requirements, as this combination has not been implemented. Adjust a test
to not claim all features when that is not the case.

Signed-off-by: Jarno Rajahalme <[email protected]>

* pkg/subnet: Fix tag in config subnets field

The Subnets field in the config was declaring a json tag, leading to a
failure of the agent `hive` command (see below). This is due to the fact
that the hive relies on a mapstructure Decoder, not a Json one, and
therefore require a mapstructure tag when the config field name is not
equal to the flag name.

Fix the tag on the field using a mapstructure one.

```
make -C daemon/ && ./daemon/cilium-agent hive
...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xacd46e]

goroutine 1 [running]:
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000b540f0, 0xa?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x12e
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a9bdd0, 0x6?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a799b0, 0x2?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive.(*Hive).PrintObjects(0xc0009554a0, {0x51e1240, 0xc0000c0030}, 0x0?)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/hive.go:459 +0x18f
github.com/cilium/hive.(*Hive).Command.func1(0xc000e0fc00?, {0x4b05b96?, 0x4?, 0x4b05aca?})
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/command.go:21 +0x2d
github.com/spf13/cobra.(*Command).execute(0xc000953508, {0x86bac20, 0x0, 0x0})
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1015 +0xb02
github.com/spf13/cobra.(*Command).ExecuteC(0xc000952f08)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1148 +0x465
github.com/spf13/cobra.(*Command).Execute(...)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1071
github.com/cilium/cilium/daemon/cmd.Execute(0x4d769d0?)
	/home/ffalzoi/cilium/cilium-2/daemon/cmd/root.go:89 +0x13
main.main()
	/home/ffalzoi/cilium/cilium-2/daemon/main.go:15 +0x1f
```

Fixes: d395d73ad3 ("pkg/subnet: Add subnet config watcher and manager")
Signed-off-by: Fabio Falzoi <[email protected]>

* linux-desired-device: introducing Cilium managed devices reconciler

Adding new reconciler in Cilium datapath/linux which can manage
life cycle of linux links created by Cilium.

Created devices are persisted on disk using write-ahead-log, upon
restart the owners of the devices are expected to redo the
configuration before calling finializer. Stale devices will be pruned.

Implementation is inspired by linux/route/reconciler.

Signed-off-by: harsimran pabla <[email protected]>

* linux-desired-device: script tests for desired-devices

Adding script tests to validate device creation and persistence.

Signed-off-by: harsimran pabla <[email protected]>

* linux-desired-device: hook desired-devices cell into main infra

Signed-off-by: harsimran pabla <[email protected]>

* fix: helm intervalSeconds value render bug

intervalSeconds is always of an integral type, no need to check kindIs float64

Fixes: #44206
Signed-off-by: jayl1e <[email protected]>

* bpf: source tuple hash seeds from node config

Move IPv4/IPv6 hash init seeds into node config and wire them from
Maglev config. BPF tuple hashing now reads CONFIG(hash_init{4,6}_seed)
instead of compile-time defines, and the legacy HASH_INIT* defines are
removed from the header writer and node_config.h.

Signed-off-by: viktor-kurchenko <[email protected]>

* feat(install) Allow hubble to run with hostUsers: false

The hubble components do not require direct mapping of container
users to system users.

Signed-off-by: Pat Riehecky <[email protected]>

* docs: fix typos in comments

Signed-off-by: Yohei Yamamoto <[email protected]>

* bpf: correct comments in cil_from_netdev function

Removed conditions from the comment block describing the cil_from_netdev function since the logic has been changed

Signed-off-by: Liyi Huang <[email protected]>

* chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all-dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* bpf: lb: Decouple DNAT operation from LB key

Instead of passing the lb4_key/lb6_key to lb4_xlate/lb6_xlate for
checksum calculation and port translation, pass the original destination
address and port directly from the CT tuple.

This change:
1. Removes the key parameter from lb4_xlate/lb6_xlate functions
2. Removes the key parameter from lb4_dnat_request/lb6_dnat_request

The CT tuple already contains the same values that were being read from
the key structure:
- tuple->daddr == key->address (original destination)
- tuple->sport == key->dport (reversed port order in CT tuple)

By removing the xlate path's dependency on key, we can now directly
modify key->address = 0 for wildcard lookups without creating a copy,
simplifying the backend selection logic.

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: lxc: Handle DSR for remote NodePort services on source node

When DSR and PER_PACKET_LB are enabled, connections fail if a client pod
sends a request to a remote node's NodePort service while the server
pod is located on the same node as the client.

The root cause is that the remote node only performs DNAT, setting the
packet's source address to client's node address, leading to a hairpin
problem. Consequently, the originating node cannot perform the necessary
REV NAT for the reply packets.

To resolve this, remote NodePort service requests are now handled on the
source node when DSR is enabled, similar to the behavior of socket-level
load balancing.

Implementation details:

- Add lb4/6_lookup_wildcard_nodeport_service(): When ENABLE_DSR is
  defined and the regular service lookup fails, check if the destination
  is a remote node's IP with a NodePort port range. If so, perform a
  wildcard lookup (address=0) to find the NodePort service.

- Use wildcard key for backend selection: When dsr_internal flag is set,
  set key->address to 0 before calling lb4/6_select_backend_id(). This
  applies to both CT_NEW (new connections) and CT_REPLY (backend
  re-selection for existing connections). This is needed for backend
  selection algorithms that use slot lookup (e.g., Random), which look
  up backend slots via lb4/6_lookup_backend_slot() using the service
  key. Without a wildcard key, the lookup would fail because backend
  slot entries are stored with the wildcard service key, not with the
  remote node's IP.

- Store original destination in CT entry: The original destination
  address and port (remote node IP and NodePort) are stored in
  ct_state_new.nat_addr/nat_port, which will be written to the CT entry
  for use in reply path RevNAT processing.

- Use cilium_dsr_nat_buffer per-CPU map: The NAT info is detected in
  __per_packet_lb_svc_xlate_4/6(), but the CT entry is created after DNAT
  when the original destination info is no longer available in the packet.
  The per-CPU buffer preserves this info across the DNAT operation.

Existing connection handling:

- This change only affects DSR traffic destined to remote node's
  NodePort. The wildcard lookup is triggered when lb4/6_lookup_service()
  fails, but it only processes packets where the destination is a remote
  node IP with a port in the NodePort range. Other traffic that fails
  the regular lookup is unaffected.

Fixes: #41962

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: lxc: Add RevNAT support for DSR remote NodePort connections

Add reverse NAT support for reply packets of DSR remote NodePort
connections. The forward path stores the original destination address
and port in the CT entry's nat_addr/nat_port fields, which are now
used during reply processing.

Implementation details:

- Extend lb4/6_rev_nat() signature: Add nat_addr and nat_port parameters.
  When nat_port is non-zero, use these values directly for RevNAT.
  When nat_port is zero, fall back to the existing rev_nat_index lookup
  for backward compatibility with existing connections.

- Modify ct_lookup_fill_state(): Copy nat_addr and nat_port from the
  CT entry to ct_state, making them available for reply processing.

- Update ipv4/6_policy(): Check for nat_port in addition to
  rev_nat_index when deciding whether to perform RevNAT. Pass the
  CT entry's NAT information to lb4/6_rev_nat().

- Update nodeport_rev_dnat_ipv4/6(): Adapt to the new lb4/6_rev_nat()
  signature by passing NULL/0 for nat_addr/nat_port (these paths use
  the traditional rev_nat_index lookup).

Existing connection handling:

1. New connections (created after this patch):
   - Forward path stores nat_addr/nat_port in CT entry
   - Reply path uses CT entry's nat_addr/nat_port for RevNAT

2. Existing connections (regular NodePort/DSR traffic):
   - CT entry has nat_addr=0, nat_port=0
   - lb4/6_rev_nat checks nat_port first:
     - If nat_port != 0: use nat_addr/nat_port directly
     - If nat_port == 0: fall back to rev_nat_index lookup
   - This ensures existing connections continue to work

3. Upgrade scenario:
   - Existing connections keep working via rev_nat_index fallback
   - New DSR remote nodeport connections use nat_addr/nat_port
   - No connection disruption during rolling upgrade

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: tests: Add tests for DSR remote NodePort handling

Add BPF unit tests to verify the DSR remote NodePort functionality
for both IPv4 and IPv6.

Test scenarios for DSR mode (tc_lxc_lb4/6_dsr_nodeport.c):

1. Pod -> Remote NodePort -> Local backend (forward path)
   - Client pod sends packet to remote node's NodePort
   - LB selects a backend on the local node (same node as client)
   - Verifies packet is DNATed to local backend IP and port
   - Verifies CT entry contains correct nat_addr (remote node IP)
     and nat_port (NodePort)

2. Local backend -> Pod (reply path)
   - Backend sends reply packet to client
   - Verifies RevNAT is applied correctly
   - Source IP/port changed to remote node IP and NodePort

3. Pod -> Remote NodePort -> Self (hairpin)
   - Client pod sends packet to remote node's NodePort
   - LB selects the client pod itself as the backend
   - Verifies DNAT to client IP and backend port
   - Verifies SNAT to loopback IP for hairpin flow

4. Hairpin reply
   - Pod replies to loopback IP
   - Verifies RevNAT restores remote node IP and NodePort

5. Existing connection handling (UDP)
   - First packet establishes CT entry via legacy path
   - Second packet should use existing CT entry
   - Verifies wildcard lookup is skipped for existing connections

Test scenarios for Hybrid mode (tc_lxc_lb4/6_hybrid_dsr_nodeport.c):

1. DSR service handling
   - Verifies DSR-enabled service triggers wildcard lookup
   - Packet is DNATed to local backend

2. SNAT service handling
   - Verifies SNAT service does NOT trigger wildcard lookup
   - Packet passes through without DNAT (handled by remote node)

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* chore(deps): update quay.io/cilium/cilium-envoy docker tag to v1.36.5-1770440937-14da6b9c8a54244f0a67cd90a0deb83e5f110a4a

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update module sigs.k8s.io/kube-api-linter to v0.0.0-20260206102632-39e3d06a2850

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update base-images

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* loadbalancer/maps: let SourceRangeKey.GetCIDR return netip.Prefix

Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.

This slightly improves pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service:  49 objs, 10042B alloc,  2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service:  59 objs, 10272B alloc,  3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service:  74 objs, 11128B alloc,  4662B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

Signed-off-by: Tobias Klauser <[email protected]>

* loadbalancer/maps: let ServiceKey.GetAddress return netip.Addr

This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.

pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects /  99675kB still reachable (per service:  38 objs,  9974B alloc,  2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service:  56 objs, 10212B alloc,  3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service:  74 objs, 11116B alloc,  4662B in-use)

Signed-off-by: Tobias Klauser <[email protected]>

* loadbalancer/maps: remove unused SkipLBMap delete methods

The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f8129d1a ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.

Signed-off-by: Tobias Klauser <[email protected]>

* fix(deps): update all go dependencies main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* docs: Update docsearch to v4.5.4

Pull in the latest theme with newer docsearch plugin version.

Signed-off-by: Joe Stringer <[email protected]>

* ci: update docs-builder

Signed-off-by: Cilium Imagebot <[email protected]>

* Use binary.NativeEndian instead of nl.NativeEndian

Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.

While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.

Follow-up to commit 2c1b49cac70b ("byteorder: use binary.NativeEndian")

Signed-off-by: Tobias Klauser <[email protected]>

* datapath: fix panic during datapath reinitialization

This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.

```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0

goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
        /go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```

With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.

Fixes 8fae439710fd4b426beae9190957d2380a96bed6 ("datapath: move DirectRoutingDevice validation to orchestrator")

Signed-off-by: Deepesh Pathak <[email protected]>

* datapath/loader: Add netkit to BPF load tests

Commit 854473726b ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.

This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.

Signed-off-by: Alasdair McWilliam <[email protected]>

* docs: add netkit requirement to kernel version list

Add Linux kernel requirement for netkit to the System Requirements.

Signed-off-by: Alasdair McWilliam <[email protected]>

* style(bpf/test): fix indentation

Signed-off-by: Andrea Terzolo <[email protected]>

* reafctor(bpf): move `icmp_wsum_accumulate` helper

This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.

Signed-off-by: Andrea Terzolo <[email protected]>

* refactor(bpf): move ICMPv6 packet generation to a separate file

The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file

Signed-off-by: Andrea Terzolo <[email protected]>

* refactor(bpf): reduce ifdef number

Signed-off-by: Andrea Terzolo <[email protected]>

* gateway-api: Update conformance test Make target

This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.

Signed-off-by: Nick Young <[email protected]>

* bpf: introduce DECLARE_CONFIG_KIND

DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.

Signed-off-by: Tobias Klauser <[email protected]>

* bpf: wire events map rate limits through node config

Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.

Signed-off-by: viktor-kurchenko <[email protected]>

* sysdump: Use label selectors for Hubble UI/Relay deployment collection

Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.

This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.

Fixes the issue where:
  cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.

Signed-off-by: darox <[email protected]>

* bpf: lxc: remove unnecessary L3 validation

There's no code that uses the IPv4 header afterwards.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf: lxc: fine-tune BPF Host Routing path

Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).

By structuring the code as a switch() statement we can also clean up one
of the goto paths.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf: xdp: prefer CTX_ACT_TX over XDP_TX

Return the generic value, so that readers understand what macro they should
be using when handling the result.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf, nat46x64: move RFC6052 prefix into node config

This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.

Updates included:

- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
  dropped.

Signed-off-by: viktor-kurchenko <[email protected]>

* neighbor: Fix description for L2 neighbor discovery

The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.

Co-authored-by: Dylan Reimerink <[email protected]>
Signed-off-by: Dylan Reimerink <[email protected]>
Signed-off-by: Paul Chaignon <[email protected]>

* CODEOWNERS: add more specific owners for operator subsystems

Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.

Signed-off-by: Tobias Klauser <[email protected]>

* .github/workflows: eks-cluster-pool-manager: fix race condition and cleanup leaked IAM roles

When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.

This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails

Signed-off-by: André Martins <[email protected]>

* hubble: Fix typos in config/set.go

Signed-off-by: harshitghagre <[email protected]>

* test/helpers: ignore error creating lease lock message

This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.

Signed-off-by: André Martins <[email protected]>

* Fix backend slot index mismatch in LB reconciler

Use slotID instead of loop index when setting backend slots to avoid
gaps when maintenance backends are skipped.

Signed-off-by: Aman-Cool <[email protected]>

* vendor: Bump to StateDB v0.6.3

This restores the ability to use Changes() against a WriteTxn that is
targeting the same table as Changes().

Signed-off-by: Jussi Maki <[email protected]>

* docs: Fix upgrade note category for tproxy

There's no new option here, it's a change in behaviour. Fix the category
for the upgrade note.

CC: Alasdair McWilliam <[email protected]>
Fixes: c257b3d0aca8 ("datapath/connector: do not support netkit and bpf.tproxy=true")
Signed-off-by: Joe Stringer <[email protected]>

* policy: Fix PASS verdict for non-consecutive tiers

Signed-off-by: Blaz Zupan <[email protected]>

* loadbalancer/healthserver: refresh ProxyRedirect per request

This commit fixes stale ProxyRedirect reads in the health server by reloading Service
state from the services table on each request. This prevents incorrect
local endpoint counts when Envoy redirect state changes after the
listener is created (which is the case).

Signed-off-by: Marco Hofstetter <[email protected]>

* localnodestore: provide WaitForNodeInformation

This commit moves global function `k8s.WaitForNodeInformation`
into the `localNodeSynchronizer`. To prevent dependencies to the
synchronizer, the `LocalNodeStore` re-exposes the same method and
delegates the call to the synchronizer.

This way, the legacy daemon initialization logic no longer needs to
dependent on the (k8s/cilium)-node resources.

This step isn't final. Eventually, the wait logic should watch
the state db "local node" for the changes.

Signed-off-by: Marco Hofstetter <[email protected]>

* localnodestore: pass localnodestore to synchronizer

With this commit, the `LocalNodeStore` passes a pointer
to itself to the synchronizer when calling `WaitForNodeInformation`.

This way, the synchronizer can update the ip allocation ranges without
using the global functions.

Signed-off-by: Marco Hofstetter <[email protected]>

* ci: e2e: add `kernel` to workflow job names

As a follow-up of changes introduced in https://github.com/cilium/cilium/pull/44126
to improve the readability of workflow job names in the GitHub UI, this
commit adds the `kernel` parameter to the name too. In OpenSearch,
it would be good to differentiate these jobs by the used kernel.
Given the kernel is a parameter that may vary frequently, it is good to be
able to differentiate runs that used the same configuration but different
kernels. That's also good in case of regressions when changing kernel.

The result will be `Setup & Test (ipsec-1, minor, 5.10)`.

Signed-off-by: Simone Magnani <[email protected]>

* pkg/datapath/bandwidth: optimize host endpoint QoS setup

The bandwidth manager was calling node.GetEndpointID() and performing
a table lookup on every UpdateBandwidthLimit() call, even though the
host endpoint QoS only needs to be set up once.

This commit introduces an atomic boolean flag that tracks whether the
host endpoint QoS has been configured. After the initial setup:
- Skip the GetEndpointID() call (fast path)
- Skip the Table.Get() call entirely
- Skip the redundant Insert() call for host endpoint

Additionally, based on review feedback:
- Change node.GetEndpointID() to return (uint64, bool) to indicate
  whether the host endpoint ID has been set, avoiding duplicate
  constants across packages
- Change node.endpointID to atomic.Uint64 for thread-safe access
  during initialization

Signed-off-by: Anand Kumar Shaw <[email protected]>

* clustermesh: fix a few misc issue with MCS-API doc

This commit fixes two issue with mcs-api doc:
- a simple typo on the enabled keywork
- change the code-block in parsed-literal as the |SCM_WEB| "variable"
  was not evaluated/replaced in the final doc with a code-block

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* Docs: improve docs around ipsec upgrade in 1.18

Signed-off-by: darox <[email protected]>

* docs(ztunnel): fix duplicate word (a set)

Signed-off-by: Alexis La Goutte <[email protected]>

* docs(ztunnel): add missing backslash

add missing backslash for install with Cilium CLI

Signed-off-by: Alexis La Goutte <[email protected]>

* clustermesh: helm: remove clustermesh.enableMCSAPISupport

This option was deprecated and its removal was announced for Cilium
1.20, so let's drop it now that we are in the 1.20 cycle!

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* daemon: enforce iptable rules are present with node-port is enabled

Moving forward the node-port code path will rely on iptables to
identify locally generated traffic and create NAT reservations for this
traffic in order to avoid reservation conflicts.

We will consider it a fatal error moving forward to disable iptable rule
installation while node-port or eBPF masquerading is enabled.

Signed-off-by: Louis DeLosSantos <[email protected]>

* bpf,nodeport: generalize SNAT conflict detection

Prior to this commit the nodeport SNAT conflict detection assumed host
traffic was sourced from the direct routing interface's IP and always
egressed on this device.

Remove this assumption by detecting locally generated traffic from any
egress interface 'bpf_host' is attached to.

This removes the dependency on the direct routing interface in the
node-port path.

Signed-off-by: Louis DeLosSantos <[email protected]>

* ztunnel: introduce end to end connectivity tests

The test infrastructure deploys three dedicated namespaces with client
and echo-server pods in each: two enrolled namespaces for testing
cross-namespace mTLS scenarios and one unenrolled namespace for baseline
verification. Pod affinity rules ensure echo-same-node pods co-locate
with clients while echo-other-node pods schedule on different nodes,
enabling both intra-node and inter-node traffic validation.

The test scenarios verify ztunnel mTLS behavior by dynamically labeling
namespaces with the io.cilium/mtls-enabled label during test execution.
For enrolled pod pairs, the tests assert that traffic flows through port
15008 (the ztunnel HBONE proxy) and that no unencrypted traffic appears
on port 8080. For unenrolled pods, the assertions are inverted to confirm
traffic bypasses ztunnel entirely. Cross-namespace scenarios confirm that
mTLS works correctly when client and server reside in separately enrolled
namespaces.

Packet capture validation runs from host network pods using tcpdump,
which is required since ztunnel operates in the host network namespace.
The tests query the ztunnel admin API to verify workload registration
before generating traffic, ensuring the any state the test depends on
has converged before making assertions.

Signed-off-by: Louis DeLosSantos <[email protected]>
Signed-off-by: Quang Nguyen <[email protected]>
Signed-off-by: Robin Gögge <[email protected]>

* ci,ztunnel: add workflows for ztunnel encryption tests

Add a new GitHub Actions workflow to run end-to-end tests for ztunnel
encryption in Cilium.

The new /ci-ztunnel-e2e trigger is added to the Ariane configuration,
pointing to the newly created conformance-ztunnel-e2e.yaml workflow
file.

Signed-off-by: Louis DeLosSantos <[email protected]>

* ci,ztunnel: add ztunnel cert script to actions

Signed-off-by: Louis DeLosSantos <[email protected]>

* datapath: remove GetRoutePostEncryptMTU()

The last user was removed by
f81d8964ce4a ("ipsec: don't reduce post-encrypt MTU for tunnel overhead").

Signed-off-by: Julian Wiedmann <[email protected]>

* datapath: ipsec: remove clean up code for encrypt IP rule

https://github.com/cilium/cilium/pull/41699 stopped installing this IP
rule in the v1.19 release. With v1.20 we can now stop cleaning it up.

Signed-off-by: Julian Wiedmann <[email protected]>

* loadbalancer/api: include proxy-redirect as backend

Currently, listing services via `cilium-dbg service list` displays
no backends for the services that use L7 proxy redirection (e.g.
Gateway API / Ingress loadbalancer service).

```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID   Frontend                    Service Type   Backend
1    10.96.62.28:80/TCP          ClusterIP      1 => 10.244.1.149:4245/TCP (active)
2    10.96.0.10:53/TCP           ClusterIP      1 => 10.244.1.69:53/TCP (active)
                                                2 => 10.244.1.109:53/TCP (active)
3    10.96.0.10:53/UDP           ClusterIP      1 => 10.244.1.69:53/UDP (active)
                                                2 => 10.244.1.109:53/UDP (active)
4    10.96.0.10:9153/TCP         ClusterIP      1 => 10.244.1.69:9153/TCP (active)
                                                2 => 10.244.1.109:9153/TCP (active)
5    10.96.0.1:443/TCP           ClusterIP      1 => 172.19.0.2:6443/TCP (active)
12   10.96.162.7:443/TCP         ClusterIP      1 => 172.19.0.2:4244/TCP (active)
15   10.96.106.135:80/TCP        ClusterIP      1 => 10.244.1.115:80/TCP (active)
16   10.96.55.103:80/TCP         ClusterIP      1 => 10.244.1.67:80/TCP (active)
17   [::]:30965/TCP              NodePort
19   [::]:30965/TCP/i            NodePort
21   0.0.0.0:30965/TCP           NodePort
23   0.0.0.0:30965/TCP/i         NodePort
25   10.96.245.249:80/TCP        ClusterIP
26   172.19.255.1:80/TCP         LoadBalancer
27   172.19.255.1:80/TCP/i       LoadBalancer
28   [fd00:10:96::d99f]:80/TCP   ClusterIP
```

Even though the actual redirection happens via BPF/iptables TPROXY, it would be
nice to display the information about the local redirection to the node-local
L7 proxy (Envoy) in some form.

Therefore, this commit changes the frontend model generation to treat the
proxy redirection as backend of the service frontend in the form
`<localhost-addr>:<proxy-port>/<proto> (active)`

Result

```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID   Frontend                    Service Type   Backend
1    10.96.62.28:80/TCP          ClusterIP      1 => 10.244.1.149:4245/TCP (active)
2    10.96.0.10:53/TCP           ClusterIP      1 => 10.244.1.69:53/TCP (active)
                                                2 => 10.244.1.109:53/TCP (active)
3    10.96.0.10:53/UDP           ClusterIP      1 => 10.244.1.69:53/UDP (active)
                                                2 => 10.244.1.109:53/UDP (active)
4    10.96.0.10:9153/TCP         ClusterIP      1 => 10.244.1.69:9153/TCP (active)
                                                2 => 10.244.1.109:9153/TCP (active)
5    10.96.0.1:443/TCP           ClusterIP      1 => 172.19.0.2:6443/TCP (active)
12   10.96.162.7:443/TCP         ClusterIP      1 => 172.19.0.2:4244/TCP (active)
15   10.96.106.135:80/TCP        ClusterIP      1 => 10.244.1.115:80/TCP (active)
16   10.96.55.103:80/TCP         ClusterIP      1 => 10.244.1.67:80/TCP (active)
17   [::]:30965/TCP              NodePort       1 => [::1]:14543/TCP (active)
19   [::]:30965/TCP/i            NodePort       1 => [::1]:14543/TCP (active)
21   0.0.0.0:30965/TCP           NodePort       1 => 127.0.0.1:14543/TCP (active)
23   0.0.0.0:30965/TCP/i         NodePort       1 => 127.0.0.1:14543/TCP (active)
25   10.96.245.249:80/TCP        ClusterIP      1 => 127.0.0.1:14543/TCP (active)
26   172.19.255.1:80/TCP         LoadBalancer   1 => 127.0.0.1:14543/TCP (active)
27   172.19.255.1:80/TCP/i       LoadBalancer   1 => 127.0.0.1:14543/TCP (active)
28   [fd00:10:96::d99f]:80/TCP   ClusterIP      1 => [::1]:14543/TCP (active)
```

Signed-off-by: Marco Hofstetter <[email protected]>

* datapath/ipcachelistener: use injected localnodestore

This commit refactors the datapath ipcache BPF listener to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.

Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).

Signed-off-by: Marco Hofstetter <[email protected]>

* datapath/linuxnodehandler: retrieve node ips from localnodestore

This commit refactors the `linuxNodeHandler` to fetch the node ips
from the injected `LocalNodeStore` instead of using the global
functions `node.GetIP[v4/v6]`.

Signed-off-by: Marco Hofstetter <[email protected]>

* identity/cache: use injected localnodestore

This commit refactors the identity cache to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.

Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).

Signed-off-by: Marco Hofstetter <[email protected]>

* node/address: remove global functions `GetIP[v4/v6]`

This commit removes the unused global functions `GetIPv4` & `GetIPv6`.

Signed-off-by: Marco Hofstetter <[email protected]>

* test: remove K8sDatapathBandwidthTest

The Ginkgo bandwidth test was already not running in CI. It only ran on
net-next kernels, but the focus group that included it
(f10-agent-hubble-bandwidth) was excluded from the net-next job.
Bandwidth enforcement is independently tested in
cilium-cli/connectivity/builder/network_bandwidth_limit.go.

Fixes: #44165
Related: #37837
Signed-off-by: Pavan More <[email protected]>

* wireguard: remove cleanup code for old userspace devices

2578e8ff4dfa ("wireguard: remove deprecated userspace fallback") removed
the userspace mode in v1.17, we can now stop cleaning up any network device
created for that mode.

Signed-off-by: Julian Wiedmann <[email protected]>

* loadbalancer: Check for equality and skip insert when not changed

This adds DeepEqual() implementations for Service and FrontendParams and uses
it to skip inserting the service and updating the frontend if nothing has changed.

As we now don't forcefully update the frontend the orphan detection has to actually
build a set to check if a frontend has been removed.

Benchmarks before:
Benchmark_UpsertServiceAndFrontends_100-6           3549            317691 ns/op            314771 objects/sec
BenchmarkInsertBackend-6                            2818            423975 ns/op            235863 objects/sec
BenchmarkReplaceBackend-6                         326682              3793 ns/op            263669 objects/sec
BenchmarkReplaceService-6                        2327074               509.4 ns/op         1963230 objects/sec

After:
Benchmark_UpsertServiceAndFrontends_100-6                   3464            331791 ns/op            301395 objects/sec
Benchmark_UpsertServiceAndFrontends_100_Unchanged-6        14652             81250 ns/op           1230766 objects/sec
BenchmarkInsertBackend-6                                    2956            401100 ns/op            249315 objects/sec
BenchmarkReplaceBackend-6                                3402430               360.9 ns/op         2771038 objects/sec
BenchmarkReplaceService-6                                2068555               556.6 ns/op         1796743 objects/sec

Signed-off-by: Jussi Maki <[email protected]>

* loadbalancer: Remove dummy ingress endpoint workaround

Remove the skipping of the 192.192.192.192:9999 dummy ingress
endpoint as the next commit will remove the creation of it.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Marco Hofstetter <[email protected]>

* operator/helm: Remove creation of dummy ingress endpoint

With the new agent load-balancer implementation the dummy endpoint
is no longer necessary. Now that v1.18 has been released with that
implementation we can remove the creation of the dummy endpoint
without breaking upgrade.

Note: This commit removes the dummy endpoint in the Operator-
(Gateway API & dedicated Ingress) & Helm- (shared Ingress) managed
`EndpointSlice`'s. The creation of the `EndpointSlice` itself will
will be done in a separate PR.

Fixes: #19262

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Marco Hofstetter <[email protected]>

* monitor: report 3rd argument in DBG_GENERIC debug monitor messages

Currently, in case cilium_dbg3(ctx, DBG_GENERIC, arg1, arg2, arg3) is used in
datapath code, the 3rd argument will not be reported in the monitor
message, e.g.:

    bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac)

Also report the 3rd argument, so the monitor message will look as
follows:

    bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac) arg3=170 (0xaa)

Signed-off-by: Tobias Klauser <[email protected]>

* policy: cleanup label selector parsing and validation

This commit cleans up the parsing and validation of policy label
selectors by removing the dependency on k8s labels library for
validation.

With this change we no longer convert cilium representation
of label keys with source prefixed using ':' delimiter to k8s specific
representation with source prefixed using '.' for validation.
This avoids the additional complexity in policy code to convert between
the two representations. It also simplifies marshalling of selectors as
the objects now have a unified format.

This is not a functional change and does not have any associated user
impact.

Signed-off-by: Deepesh Pathak <[email protected]>

* helm/ztunnel: bind health check to localhost

Security hardening for ztunnel running with hostNetwork: true:

Add host field to readiness probe to bind the health check port 15021
to 127.0.0.1 instead of 0.0.0.0. This reduces attack surface by ensuring
the health check endpoint is only accessible from localhost (kubelet
runs on same node).

Signed-off-by: Quang Nguyen <[email protected]>

* ci:wireguard: enable Host Firewall in native routing e2e tests

This enabled Host Firewall in the `wireguard-3` config.
This helps us validating that host-related packets go through the host
firewall hook in `cilium_host` when WireGuard is enabled in native routing.
In overlay, even if we'd miss a redirect we'd see the packet in bpf_overlay,
which will then redirect the packet to bpf_host for HostFW validation.

Signed-off-by: Simone Magnani <[email protected]>

* mcsapi: Add namespace filtering conditions to ServiceImport controller

Add namespace global status checking to the ServiceImport controller:
- Set Invalid condition on ServiceExport when namespace is not global
- Set NamespaceNotGlobal condition on ServiceImport when namespace is not global
- Signal service controller to skip/delete derived Service for non-global namespaces
  by setting SupportedIPFamilies annotation to empty

This implements the MCS-API portion of the ClusterMesh global namespace
filtering feature as described in CFP-39876.

Signed-off-by: Jacques Massa <[email protected]>

* docs: split up network policy language page

Up until now, the page title had been "Layer 3 Examples", which is
a section headline and confusing, since it is among other examples.
Splitting up into several pages, similar to `network/kubernetes/`,
keeps the ToC as it is, and makes it easier to navigate compared to
the lengthy page it was, while also giving each page a suitable
headline.
Since examples are mixed with the language specification, change
headings from "Layer 3 Examples" to "Layer 3 Policies", etc.
Drop the old page and redirect to the overview to keep links working.

Signed-off-by: Daniel Maslowski <[email protected]>

* golangci-lint: fix and simplify golangci-lint.sh

golangci-lint.sh would always rebuild the binary. The script was over-engineered
and didn't use the `golangci-lint version` subcommand, which doesn't need the
version string parsing logic. Simplify the script and fix the rebuild trigger.

Prepare the directory structure for splitting the main .golangci-lint.yaml in a
subsequent commit.

Signed-off-by: Timo Beckers <[email protected]>

* golangci-lint: split kubeapi configuration into separate file

The addition of kubeapilinter forced all users of the main .golangci.yaml to be
running a custom-built version of the tool and make sure it's kept in sync.

VSCode runs `golangci-lint run` in the repo by default and requires explicit
configuration to use another tool, like a script in the Cilium repository.
Before this PR, the script was broken and would always rebuilt custom-gcl.

Breaking the default tooling is not great. This commit moves kubeapi-specific
configuration to a dedicated config file in tools/golangci-lint-kubeapi and
specifies it explicitly when starting the kubeapi linter.

Signed-off-by: Timo Beckers <[email protected]>

* policy: Import the ClusterNetworkPolicy v1alpha2 API go modules

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add parser function for Cluster Network Policy

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add flags to enable or disable Cluster Network Policy

Disabled by default. A new Makefile target is added that enables it in kind clusters.

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add watcher for Cluster Network Policy

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Move NetworkPolicy and CNP/CCNP to the 'Normal' tier

Signed-off-by: Blaz Zupan <[email protected]>

* CODEOWNERS: re-assign operator/identitygc, adjust endpoint team description

Assign operator/identitygc to @cilium/sig-policy instead of
@cilium/endpoint based on Joe's feedback. This seems like the more
appropriate team based on the description.

Also update the description for the @cilium/endpoint team to cover
responsibilities for cluster-wide/k8s considerations and endpoint
lifecycle.

Follow-up to commit 11376d5c715b ("CODEOWNERS: add more specific owners
for operator subsystems").

Ref. https://github.com/cilium/cilium/pull/44279#pullrequestreview-3780398303

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>

* nodemap: converted net.IP to netip.Addr, Part of #24246

- Converted net.IP to netip.Addr in node map functions
- Added validation for zero netip.Addr values in newNodeKey
- Wrapped parse errors with %w for better error context

Signed-off-by: Sanjeevliv <[email protected]>

* bpf/tests: fix byte ordering for for TCP seq/win values

The pktgen logic for BPF tests sets default values on TCP packets in
host byte order only. This feeds into the TCP checksum calculation
and while these inputs are technically incorrect, tests that assert
the TCP checksum are correctly testing the invalid output values.

With the introduction of scapy packet definitions, matching these values
to existing tests is difficult because scapy does convert seq and win
values into network byte order. The effect is that L4 sequence numbers
between pktgen and scapy are wrong.

This commit updates pktgen and existing tests that assert these values
to use the appropriate byte conversion routines so that both pktgen
and scapy definitions produce the same results.

This causes all affected BPF tests to fail. This will be addressed
in the next commit.

Signed-off-by: Alasdair McWilliam <[email protected]>

* bpf/tests: fix TCP checksum assertions in all tests

This commit updates every test that asserts TCP checksums to reflect new
values following the change to the default TCP seq/win values, such that
the assertions for this should succeed with a scapy packet definition.

As part of this change, test failure logs have been updated to correctly
show value observed in a packet, and the expected value, both in network
byte order. This is to simplify future debugging. This has also been
applied to UDP checksum assertions for consistency.

Signed-off-by: Alasdair McWilliam <[email protected]>

* bpf/tests: fix default_data definition for scapy tests

The pktgen code defines default_data macro as a string literal which
includes a NUL-terminating character. The size of this string is 20
bytes.

The scapy code defines the same value without the NUL-terminating
character. The size of this string is 19 bytes.

As a result, when converting a test from pktgen to scapy, the packet
length changes, which causes assertion failures beause the change leads
to different L3 checksum values.

This commit modifies the scapy default_data variable to include a NUL
terminating character to match pktgen definitions. It is anticipated
this can be removed once all BPF tests are migrated to scapy.

Signed-off-by: Alasdair McWilliam <[email protected]>

* endpoint, fqdn: remove restoration of deprecated V1 DNSRules

Now that 1.16 is EOL, the code restoring PortProto V1 (without protocol)
can be removed.

Signed-off-by: Tobias Klauser <[email protected]>

* endpoint: rename DNS rules field

The V2 suffix is unnecessary now that we stop restoring V1 rules and
there is only a single supported version/field.

Also add a test that verifies that V2 rules are restored and V1 rules
are ignored while retaining backwards compatibility.

Signed-off-by: Tobias Klauser <[email protected]>

* tests: Ignore identity manager related error in versions < 1.18

Due to https://github.com/cilium/cilium/pull/42661 and
https://github.com/cilium/cilium/pull/42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
https://github.com/cilium/cilium/pull/42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>

* metrics: remove agent bootstrap metrics

This commit removes the deprecated agent bootstrap metrics.

Hive job metrics can be used to inspect the duration of the legacy
daemon initialization job. In special cases it might be worth to
introduce context specific metrics - similar to what has been
introduced for the endpoint restoration logic.

Signed-off-by: Marco Hofstetter <[email protected]>

* policy: fix policy tests

This fixes a policy break due to how label source is handled that
recently changed.

Fixes: 270daa4067 ("policy: Add parser function for Cluster Network Policy")
Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Odin Ugedal <[email protected]>

* resource/test: let TestResource_WithFakeClient set resource version

Update the TestResource_WithFakeClient test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* cid/test: let TestUpdatePodLabels set resource version

Update the TestUpdatePodLabels test to correctly specify the expected
resource version during updates, in preparation for extending the fake
client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* ipam/test: let Test_LocalNodeCIDRsSyncer set resource version

Update the Test_LocalNodeCIDRsSyncer test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* bgp/test: correctly set resource version when updating test resources

Update the bgp tests to correctly specify the expected resource version
during updates, in preparation for extending the fake client to actually
enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* test/controlplane: adaptation for optimistic concurrency control

Update the UpdateObjects helper to use the [ObjectTracker.Patch],
instead of [ObjectTracker.Update], in preparation for the subsequent
commit that will make the latter implement optimistic concurrency
control, and validate resource version mismatches, which is not
required in this context.

Signed-off-by: Marco Iorio <[email protected]>

* k8s/client/fake: fix resource version configuration in tracker

Currently, the object tracker is affected by a bug that causes the
resource version to not be set on creation or update if the object
does not have the [metav1.TypeMeta] set. Indeed, in that case, the
function updating the TypeMeta creates a deep copy of the object,
causing operations performed via [meta.Accessor] to act on the old
copy, and not have effect. Let's get this fixed by changing the
[fillTypeMetaIfNeeded] function to not create a deep copy, given
that it already operates on a copy of the original object.

Signed-off-by: Marco Iorio <[email protected]>

* k8s/client/fake: let update operations respect resource versioning

Currently, the statedb object tracker backing the fake kubernetes client
used for testing purposes does not respect resource versioning, and allows
update operations to succeed regardless of the provided resource version.
While convenient for the `k8s/update` command itself, this approach is
problematic in case of controllers acting on the same resources, as it
can lead to objects being unexpectedly reverted to incorrect versions,
due to the missing optimistic concurrency control.

Let's get this fixed by extending the update implementation to additionally
compare the resource version of the stored and provided objects, and reject
the update in case they do not match, as the real Kubernetes API Server
would do. By default, the k8s/update command still ignores the provided
resource version, letting the update succeed regardless: this matches the
desired behavior in the vast majority of the tests, and avoids the need
for complex operations to set the expected resource version. Still, if
necessary, the stricter behavior can be enabled via the dedicated flag.

Signed-off-by: Marco Iorio <[email protected]>

* chore(deps): update base-images to v1.26.0

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* chore(deps): update cilium/cilium-cli action to v0.19.1

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.6.8

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[b…
jiashengz pushed a commit to Roblox/cilium that referenced this pull request Feb 23, 2026
* ci: e2e: enhance readability of workflow job name

This commit updates the name of the "Setup & Test" job in the
GitHub Actions workflow for e2e upgrade tests to include only the matrix
parameters "name" and "mode". This change improves the readability
of the workflow runs by providing more context about the specific
configuration being tested.

Prior to this, the name of each job contained the whole matrix combination,
which in the UI resulted to be cut off and not readable. Given that now
we use the same workflow file for running both `minor` and `patch` upgrades,
let's make the displayed name simpler.

The result will be `Setup & Test (ipsec-1, minor)`.

Signed-off-by: Simone Magnani <[email protected]>

* ci: e2e: log matrix configuration in each job

This commits adds as a first step of the `Setup & Test` job for e2e-upgrade
a simple step to dump the current matrix configuration being tested.

The previous commit, modified the title to simply display the matrix entry
name and mode (e.g., `Setup & Test (ipsec-1, minor)`) rather than the
whole configuration. In UI, that would result to be truncated anyway.

It is true that, given the matrix.name (e.g., ipsec-1), a user can open the
specific file and lookup the configuration required, but I think that
having a step where we dump it would speed up and easy debuggability in CI.

The output would be similar to:

```
> Log Matrix Configuration
Current matrix configuration:
{
  "name": "wireguard-1",
  "kernel": "5.10",
  "kube-proxy": "iptables",
  "kpr": "true",
  "devices": "{eth0,eth1}",
  "secondary-network": "true",
  "tunnel": "vxlan",
  "encryption": "wireguard",
  "encryption-node": "false",
  "lb-mode": "snat",
  "endpoint-routes": "true",
  "egress-gateway": "true",
  "ingress-controller": "true",
  "mode": "minor"
}
```

Signed-off-by: Simone Magnani <[email protected]>

* endpoint: Log labels as structured JSON objects

Standardize logging in pkg/endpoint so that identityLabels and related fields are logged as structured JSON objects instead of comma-separated strings by explicitly casting labels.Labels to map[string]labels.Label.

Signed-off-by: Jie WU <[email protected]>

```release-note
endpoint: Log labels as structured JSON objects
```

* feat(helm): hubble-ui containers set to pss-restricted

This sets the hubble-ui pods/containers to match k8s
pss-restricted profile along with the optional
`readOnlyRootFilesystem: true`

Signed-off-by: Pat Riehecky <[email protected]>

* helm: allow multicluster-services installCRDs to update CRDs

Previously cilium-operator fails to start if MCS/installCRDs is enabled
because it does not have permissions to update the CRD with this log
message:

level=error msg="Unable to update CRD"
module=operator.operator-controlplane.leader-lifecycle.create-crds
name=serviceimports.multicluster.x-k8s.io
error="customresourcedefinitions.apiextensions.k8s.io
\"serviceimports.multicluster.x-k8s.io\" is forbidden: User
\"system:serviceaccount:kube-system:cilium-operator\" cannot update
resource \"customresourcedefinitions\" in API group
\"apiextensions.k8s.io\" at the cluster scope"

This patch adds the necessary permissions to cilium-operator if you have
mcs/installCRDs enabled

Fixes: #44210
Fixes: 3874013329d0 ("clustermesh: add config for auto installing
MCS-API CRDs")

Signed-off-by: Florian Ströger <[email protected]>

* policy: (mechanical) refactor out flow lookup types

A subsequent commit will include an alternate policy iteration system,
so it will be nice to move the types to policy/types.

This also removes the now-useless Decision type, as it's not used
anywhere in the codebase.

Signed-off-by: Casey Callendrello <[email protected]>

* policy: add a simple iterative policy simulator

This is a simple userspace tool that executes rules step-by-step. It's
purpose will be to validate more complex policy scenarios, ideally by
fuzzing.

To ensure it's output matches that of the existing policy engine, it
matches the LookupFlow method signature, and existing tests validate
that the simulation engine returns the same verdict.

Signed-off-by: Casey Callendrello <[email protected]>

* Add fuzz-based policy testing

This generates random policy corpuses and compares MapState-based policy
calculation with the iterative simulator.

Signed-off-by: Casey Callendrello <[email protected]>

* policy: Fix fuzz testing

Avoid using *testing.F for the logger as then any log within the fuzz
test would fail.

Fix the order of expected and actual for require.Equal.

Add more debugging.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Hide precedence details better

Hide precedence details from the policymap package.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add optional indexing by identity

Add optional mapState indexing by identity to support incremental removal
of generated keys. This is only needed for deletion pass entries, so the
index is only used if the policy has pass verdicts.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add default deny rule when pass verdict is used

Proper processing of pass verdicts requires the default deny rule to be
explicitly added to the mapstate so that it can be seen by pass verdict
entries.

The default rule is added to the next tier if any non-default tiers or
priorities are in use, of if the traffic direction has any pass
rules. This way the pass rule can pass to the added default deny (or
allow) rule.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Fix L7Filter precedence handling for pass verdicts

Deny takes precedence over allow and pass, allow takes precedence over
pass. Define new HasPrecedenceOver() to handle this instead of using just
IsDeny() like before.  Would be simpler if Allow was not the zero value,
but changing that would require changing all unit testing code that uses
it as the default.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Fix per-tier priority range allocation

Fix tier base priority calculation. When figuring out the priority range
for each tier, the full range of the remaining tiers must be included to
add enough space for pass verdicts on higher tiers. Then, when setting
the base priotity of each tier, this has to be reversed.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add Fuzz cases

Commit fuzzer cases found during development.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add generated L3/4 entries, multipass support

A pass of a specific identity to a lower tier rule with wildcard identity
should pass the given identity only and keep the wildcard entry at the
original precedence to take care of traffic with other identities. Since
the original entry needs to be kept, a new generated entry with the
identity from the pass entry and the L4 from the passed to entry must be
added.

We missed this case earlier due to BroaderOrEqualKeys only iterating
wildcard identity entries when the new key is a wildcard entry. Entries
that have a broader or equal L4 but more specific L3 are not as a whole
"broader or equal". To handle the need for generated entries for the pass
verdict processing "BroaderOrEqualKeys" is changed to also iterate all
specific L3 keys if the L4 is broader or equal and the given key has the
wildcard identity. The old behavior is retained with
CoveringBroaderOrEqualKeys(). Similarly, NarrowerOrEqualKeys() is renamed
as CoveringNarrowerOrEqualKeys() while NarrowerOrEqualKeys() now also
iterates keys with the wildcard identity when the given key has a
specific identity.

The addition of generated entries requires these entries to be deleted
when that identity is incrementally deleted. Since selector cache is
transactional we can delete all keys with the deleted identity, when the
first key with that identity is deleted. To make this efficient we use
the new id index.

To add support for pass verdicts at multiple tiers, the pass metadata is
now stored as a slice. Overhead to non-pass entries is reduced by storing
the slice via a pointer ('passes'), as most mapStateEntries would not
have any pass metadata.

If 'passes' is non-nil, then the pointed-to slice must have at least
one element, and all elements must have non-zero 'passPrecedence'.

When merging pass metadata we clone the slice to be mutated so that the
same slice can safely be used in multiple entries.

Split insertWithPasses() from insertWithChanges(); insertWithPasses() is
only calling it if the policy has any pass verdicts. This reduces the
chance of regressions for non-pass policies.

Log a warning if a policy with pass verdicts is also using auth
requirements, as this combination has not been implemented. Adjust a test
to not claim all features when that is not the case.

Signed-off-by: Jarno Rajahalme <[email protected]>

* pkg/subnet: Fix tag in config subnets field

The Subnets field in the config was declaring a json tag, leading to a
failure of the agent `hive` command (see below). This is due to the fact
that the hive relies on a mapstructure Decoder, not a Json one, and
therefore require a mapstructure tag when the config field name is not
equal to the flag name.

Fix the tag on the field using a mapstructure one.

```
make -C daemon/ && ./daemon/cilium-agent hive
...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xacd46e]

goroutine 1 [running]:
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000b540f0, 0xa?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x12e
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a9bdd0, 0x6?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a799b0, 0x2?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive.(*Hive).PrintObjects(0xc0009554a0, {0x51e1240, 0xc0000c0030}, 0x0?)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/hive.go:459 +0x18f
github.com/cilium/hive.(*Hive).Command.func1(0xc000e0fc00?, {0x4b05b96?, 0x4?, 0x4b05aca?})
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/command.go:21 +0x2d
github.com/spf13/cobra.(*Command).execute(0xc000953508, {0x86bac20, 0x0, 0x0})
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1015 +0xb02
github.com/spf13/cobra.(*Command).ExecuteC(0xc000952f08)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1148 +0x465
github.com/spf13/cobra.(*Command).Execute(...)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1071
github.com/cilium/cilium/daemon/cmd.Execute(0x4d769d0?)
	/home/ffalzoi/cilium/cilium-2/daemon/cmd/root.go:89 +0x13
main.main()
	/home/ffalzoi/cilium/cilium-2/daemon/main.go:15 +0x1f
```

Fixes: d395d73ad3 ("pkg/subnet: Add subnet config watcher and manager")
Signed-off-by: Fabio Falzoi <[email protected]>

* linux-desired-device: introducing Cilium managed devices reconciler

Adding new reconciler in Cilium datapath/linux which can manage
life cycle of linux links created by Cilium.

Created devices are persisted on disk using write-ahead-log, upon
restart the owners of the devices are expected to redo the
configuration before calling finializer. Stale devices will be pruned.

Implementation is inspired by linux/route/reconciler.

Signed-off-by: harsimran pabla <[email protected]>

* linux-desired-device: script tests for desired-devices

Adding script tests to validate device creation and persistence.

Signed-off-by: harsimran pabla <[email protected]>

* linux-desired-device: hook desired-devices cell into main infra

Signed-off-by: harsimran pabla <[email protected]>

* fix: helm intervalSeconds value render bug

intervalSeconds is always of an integral type, no need to check kindIs float64

Fixes: #44206
Signed-off-by: jayl1e <[email protected]>

* bpf: source tuple hash seeds from node config

Move IPv4/IPv6 hash init seeds into node config and wire them from
Maglev config. BPF tuple hashing now reads CONFIG(hash_init{4,6}_seed)
instead of compile-time defines, and the legacy HASH_INIT* defines are
removed from the header writer and node_config.h.

Signed-off-by: viktor-kurchenko <[email protected]>

* feat(install) Allow hubble to run with hostUsers: false

The hubble components do not require direct mapping of container
users to system users.

Signed-off-by: Pat Riehecky <[email protected]>

* docs: fix typos in comments

Signed-off-by: Yohei Yamamoto <[email protected]>

* bpf: correct comments in cil_from_netdev function

Removed conditions from the comment block describing the cil_from_netdev function since the logic has been changed

Signed-off-by: Liyi Huang <[email protected]>

* chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all-dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* bpf: lb: Decouple DNAT operation from LB key

Instead of passing the lb4_key/lb6_key to lb4_xlate/lb6_xlate for
checksum calculation and port translation, pass the original destination
address and port directly from the CT tuple.

This change:
1. Removes the key parameter from lb4_xlate/lb6_xlate functions
2. Removes the key parameter from lb4_dnat_request/lb6_dnat_request

The CT tuple already contains the same values that were being read from
the key structure:
- tuple->daddr == key->address (original destination)
- tuple->sport == key->dport (reversed port order in CT tuple)

By removing the xlate path's dependency on key, we can now directly
modify key->address = 0 for wildcard lookups without creating a copy,
simplifying the backend selection logic.

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: lxc: Handle DSR for remote NodePort services on source node

When DSR and PER_PACKET_LB are enabled, connections fail if a client pod
sends a request to a remote node's NodePort service while the server
pod is located on the same node as the client.

The root cause is that the remote node only performs DNAT, setting the
packet's source address to client's node address, leading to a hairpin
problem. Consequently, the originating node cannot perform the necessary
REV NAT for the reply packets.

To resolve this, remote NodePort service requests are now handled on the
source node when DSR is enabled, similar to the behavior of socket-level
load balancing.

Implementation details:

- Add lb4/6_lookup_wildcard_nodeport_service(): When ENABLE_DSR is
  defined and the regular service lookup fails, check if the destination
  is a remote node's IP with a NodePort port range. If so, perform a
  wildcard lookup (address=0) to find the NodePort service.

- Use wildcard key for backend selection: When dsr_internal flag is set,
  set key->address to 0 before calling lb4/6_select_backend_id(). This
  applies to both CT_NEW (new connections) and CT_REPLY (backend
  re-selection for existing connections). This is needed for backend
  selection algorithms that use slot lookup (e.g., Random), which look
  up backend slots via lb4/6_lookup_backend_slot() using the service
  key. Without a wildcard key, the lookup would fail because backend
  slot entries are stored with the wildcard service key, not with the
  remote node's IP.

- Store original destination in CT entry: The original destination
  address and port (remote node IP and NodePort) are stored in
  ct_state_new.nat_addr/nat_port, which will be written to the CT entry
  for use in reply path RevNAT processing.

- Use cilium_dsr_nat_buffer per-CPU map: The NAT info is detected in
  __per_packet_lb_svc_xlate_4/6(), but the CT entry is created after DNAT
  when the original destination info is no longer available in the packet.
  The per-CPU buffer preserves this info across the DNAT operation.

Existing connection handling:

- This change only affects DSR traffic destined to remote node's
  NodePort. The wildcard lookup is triggered when lb4/6_lookup_service()
  fails, but it only processes packets where the destination is a remote
  node IP with a port in the NodePort range. Other traffic that fails
  the regular lookup is unaffected.

Fixes: #41962

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: lxc: Add RevNAT support for DSR remote NodePort connections

Add reverse NAT support for reply packets of DSR remote NodePort
connections. The forward path stores the original destination address
and port in the CT entry's nat_addr/nat_port fields, which are now
used during reply processing.

Implementation details:

- Extend lb4/6_rev_nat() signature: Add nat_addr and nat_port parameters.
  When nat_port is non-zero, use these values directly for RevNAT.
  When nat_port is zero, fall back to the existing rev_nat_index lookup
  for backward compatibility with existing connections.

- Modify ct_lookup_fill_state(): Copy nat_addr and nat_port from the
  CT entry to ct_state, making them available for reply processing.

- Update ipv4/6_policy(): Check for nat_port in addition to
  rev_nat_index when deciding whether to perform RevNAT. Pass the
  CT entry's NAT information to lb4/6_rev_nat().

- Update nodeport_rev_dnat_ipv4/6(): Adapt to the new lb4/6_rev_nat()
  signature by passing NULL/0 for nat_addr/nat_port (these paths use
  the traditional rev_nat_index lookup).

Existing connection handling:

1. New connections (created after this patch):
   - Forward path stores nat_addr/nat_port in CT entry
   - Reply path uses CT entry's nat_addr/nat_port for RevNAT

2. Existing connections (regular NodePort/DSR traffic):
   - CT entry has nat_addr=0, nat_port=0
   - lb4/6_rev_nat checks nat_port first:
     - If nat_port != 0: use nat_addr/nat_port directly
     - If nat_port == 0: fall back to rev_nat_index lookup
   - This ensures existing connections continue to work

3. Upgrade scenario:
   - Existing connections keep working via rev_nat_index fallback
   - New DSR remote nodeport connections use nat_addr/nat_port
   - No connection disruption during rolling upgrade

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: tests: Add tests for DSR remote NodePort handling

Add BPF unit tests to verify the DSR remote NodePort functionality
for both IPv4 and IPv6.

Test scenarios for DSR mode (tc_lxc_lb4/6_dsr_nodeport.c):

1. Pod -> Remote NodePort -> Local backend (forward path)
   - Client pod sends packet to remote node's NodePort
   - LB selects a backend on the local node (same node as client)
   - Verifies packet is DNATed to local backend IP and port
   - Verifies CT entry contains correct nat_addr (remote node IP)
     and nat_port (NodePort)

2. Local backend -> Pod (reply path)
   - Backend sends reply packet to client
   - Verifies RevNAT is applied correctly
   - Source IP/port changed to remote node IP and NodePort

3. Pod -> Remote NodePort -> Self (hairpin)
   - Client pod sends packet to remote node's NodePort
   - LB selects the client pod itself as the backend
   - Verifies DNAT to client IP and backend port
   - Verifies SNAT to loopback IP for hairpin flow

4. Hairpin reply
   - Pod replies to loopback IP
   - Verifies RevNAT restores remote node IP and NodePort

5. Existing connection handling (UDP)
   - First packet establishes CT entry via legacy path
   - Second packet should use existing CT entry
   - Verifies wildcard lookup is skipped for existing connections

Test scenarios for Hybrid mode (tc_lxc_lb4/6_hybrid_dsr_nodeport.c):

1. DSR service handling
   - Verifies DSR-enabled service triggers wildcard lookup
   - Packet is DNATed to local backend

2. SNAT service handling
   - Verifies SNAT service does NOT trigger wildcard lookup
   - Packet passes through without DNAT (handled by remote node)

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* chore(deps): update quay.io/cilium/cilium-envoy docker tag to v1.36.5-1770440937-14da6b9c8a54244f0a67cd90a0deb83e5f110a4a

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update module sigs.k8s.io/kube-api-linter to v0.0.0-20260206102632-39e3d06a2850

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update base-images

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* loadbalancer/maps: let SourceRangeKey.GetCIDR return netip.Prefix

Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.

This slightly improves pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service:  49 objs, 10042B alloc,  2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service:  59 objs, 10272B alloc,  3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service:  74 objs, 11128B alloc,  4662B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

Signed-off-by: Tobias Klauser <[email protected]>

* loadbalancer/maps: let ServiceKey.GetAddress return netip.Addr

This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.

pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects /  99675kB still reachable (per service:  38 objs,  9974B alloc,  2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service:  56 objs, 10212B alloc,  3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service:  74 objs, 11116B alloc,  4662B in-use)

Signed-off-by: Tobias Klauser <[email protected]>

* loadbalancer/maps: remove unused SkipLBMap delete methods

The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f8129d1a ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.

Signed-off-by: Tobias Klauser <[email protected]>

* fix(deps): update all go dependencies main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* docs: Update docsearch to v4.5.4

Pull in the latest theme with newer docsearch plugin version.

Signed-off-by: Joe Stringer <[email protected]>

* ci: update docs-builder

Signed-off-by: Cilium Imagebot <[email protected]>

* Use binary.NativeEndian instead of nl.NativeEndian

Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.

While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.

Follow-up to commit 2c1b49cac70b ("byteorder: use binary.NativeEndian")

Signed-off-by: Tobias Klauser <[email protected]>

* datapath: fix panic during datapath reinitialization

This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.

```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0

goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
        /go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```

With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.

Fixes 8fae439710fd4b426beae9190957d2380a96bed6 ("datapath: move DirectRoutingDevice validation to orchestrator")

Signed-off-by: Deepesh Pathak <[email protected]>

* datapath/loader: Add netkit to BPF load tests

Commit 854473726b ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.

This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.

Signed-off-by: Alasdair McWilliam <[email protected]>

* docs: add netkit requirement to kernel version list

Add Linux kernel requirement for netkit to the System Requirements.

Signed-off-by: Alasdair McWilliam <[email protected]>

* style(bpf/test): fix indentation

Signed-off-by: Andrea Terzolo <[email protected]>

* reafctor(bpf): move `icmp_wsum_accumulate` helper

This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.

Signed-off-by: Andrea Terzolo <[email protected]>

* refactor(bpf): move ICMPv6 packet generation to a separate file

The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file

Signed-off-by: Andrea Terzolo <[email protected]>

* refactor(bpf): reduce ifdef number

Signed-off-by: Andrea Terzolo <[email protected]>

* gateway-api: Update conformance test Make target

This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.

Signed-off-by: Nick Young <[email protected]>

* bpf: introduce DECLARE_CONFIG_KIND

DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.

Signed-off-by: Tobias Klauser <[email protected]>

* bpf: wire events map rate limits through node config

Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.

Signed-off-by: viktor-kurchenko <[email protected]>

* sysdump: Use label selectors for Hubble UI/Relay deployment collection

Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.

This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.

Fixes the issue where:
  cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.

Signed-off-by: darox <[email protected]>

* bpf: lxc: remove unnecessary L3 validation

There's no code that uses the IPv4 header afterwards.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf: lxc: fine-tune BPF Host Routing path

Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).

By structuring the code as a switch() statement we can also clean up one
of the goto paths.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf: xdp: prefer CTX_ACT_TX over XDP_TX

Return the generic value, so that readers understand what macro they should
be using when handling the result.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf, nat46x64: move RFC6052 prefix into node config

This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.

Updates included:

- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
  dropped.

Signed-off-by: viktor-kurchenko <[email protected]>

* neighbor: Fix description for L2 neighbor discovery

The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.

Co-authored-by: Dylan Reimerink <[email protected]>
Signed-off-by: Dylan Reimerink <[email protected]>
Signed-off-by: Paul Chaignon <[email protected]>

* CODEOWNERS: add more specific owners for operator subsystems

Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.

Signed-off-by: Tobias Klauser <[email protected]>

* .github/workflows: eks-cluster-pool-manager: fix race condition and cleanup leaked IAM roles

When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.

This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails

Signed-off-by: André Martins <[email protected]>

* hubble: Fix typos in config/set.go

Signed-off-by: harshitghagre <[email protected]>

* test/helpers: ignore error creating lease lock message

This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.

Signed-off-by: André Martins <[email protected]>

* Fix backend slot index mismatch in LB reconciler

Use slotID instead of loop index when setting backend slots to avoid
gaps when maintenance backends are skipped.

Signed-off-by: Aman-Cool <[email protected]>

* vendor: Bump to StateDB v0.6.3

This restores the ability to use Changes() against a WriteTxn that is
targeting the same table as Changes().

Signed-off-by: Jussi Maki <[email protected]>

* docs: Fix upgrade note category for tproxy

There's no new option here, it's a change in behaviour. Fix the category
for the upgrade note.

CC: Alasdair McWilliam <[email protected]>
Fixes: c257b3d0aca8 ("datapath/connector: do not support netkit and bpf.tproxy=true")
Signed-off-by: Joe Stringer <[email protected]>

* policy: Fix PASS verdict for non-consecutive tiers

Signed-off-by: Blaz Zupan <[email protected]>

* loadbalancer/healthserver: refresh ProxyRedirect per request

This commit fixes stale ProxyRedirect reads in the health server by reloading Service
state from the services table on each request. This prevents incorrect
local endpoint counts when Envoy redirect state changes after the
listener is created (which is the case).

Signed-off-by: Marco Hofstetter <[email protected]>

* localnodestore: provide WaitForNodeInformation

This commit moves global function `k8s.WaitForNodeInformation`
into the `localNodeSynchronizer`. To prevent dependencies to the
synchronizer, the `LocalNodeStore` re-exposes the same method and
delegates the call to the synchronizer.

This way, the legacy daemon initialization logic no longer needs to
dependent on the (k8s/cilium)-node resources.

This step isn't final. Eventually, the wait logic should watch
the state db "local node" for the changes.

Signed-off-by: Marco Hofstetter <[email protected]>

* localnodestore: pass localnodestore to synchronizer

With this commit, the `LocalNodeStore` passes a pointer
to itself to the synchronizer when calling `WaitForNodeInformation`.

This way, the synchronizer can update the ip allocation ranges without
using the global functions.

Signed-off-by: Marco Hofstetter <[email protected]>

* ci: e2e: add `kernel` to workflow job names

As a follow-up of changes introduced in https://github.com/cilium/cilium/pull/44126
to improve the readability of workflow job names in the GitHub UI, this
commit adds the `kernel` parameter to the name too. In OpenSearch,
it would be good to differentiate these jobs by the used kernel.
Given the kernel is a parameter that may vary frequently, it is good to be
able to differentiate runs that used the same configuration but different
kernels. That's also good in case of regressions when changing kernel.

The result will be `Setup & Test (ipsec-1, minor, 5.10)`.

Signed-off-by: Simone Magnani <[email protected]>

* pkg/datapath/bandwidth: optimize host endpoint QoS setup

The bandwidth manager was calling node.GetEndpointID() and performing
a table lookup on every UpdateBandwidthLimit() call, even though the
host endpoint QoS only needs to be set up once.

This commit introduces an atomic boolean flag that tracks whether the
host endpoint QoS has been configured. After the initial setup:
- Skip the GetEndpointID() call (fast path)
- Skip the Table.Get() call entirely
- Skip the redundant Insert() call for host endpoint

Additionally, based on review feedback:
- Change node.GetEndpointID() to return (uint64, bool) to indicate
  whether the host endpoint ID has been set, avoiding duplicate
  constants across packages
- Change node.endpointID to atomic.Uint64 for thread-safe access
  during initialization

Signed-off-by: Anand Kumar Shaw <[email protected]>

* clustermesh: fix a few misc issue with MCS-API doc

This commit fixes two issue with mcs-api doc:
- a simple typo on the enabled keywork
- change the code-block in parsed-literal as the |SCM_WEB| "variable"
  was not evaluated/replaced in the final doc with a code-block

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* Docs: improve docs around ipsec upgrade in 1.18

Signed-off-by: darox <[email protected]>

* docs(ztunnel): fix duplicate word (a set)

Signed-off-by: Alexis La Goutte <[email protected]>

* docs(ztunnel): add missing backslash

add missing backslash for install with Cilium CLI

Signed-off-by: Alexis La Goutte <[email protected]>

* clustermesh: helm: remove clustermesh.enableMCSAPISupport

This option was deprecated and its removal was announced for Cilium
1.20, so let's drop it now that we are in the 1.20 cycle!

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* daemon: enforce iptable rules are present with node-port is enabled

Moving forward the node-port code path will rely on iptables to
identify locally generated traffic and create NAT reservations for this
traffic in order to avoid reservation conflicts.

We will consider it a fatal error moving forward to disable iptable rule
installation while node-port or eBPF masquerading is enabled.

Signed-off-by: Louis DeLosSantos <[email protected]>

* bpf,nodeport: generalize SNAT conflict detection

Prior to this commit the nodeport SNAT conflict detection assumed host
traffic was sourced from the direct routing interface's IP and always
egressed on this device.

Remove this assumption by detecting locally generated traffic from any
egress interface 'bpf_host' is attached to.

This removes the dependency on the direct routing interface in the
node-port path.

Signed-off-by: Louis DeLosSantos <[email protected]>

* ztunnel: introduce end to end connectivity tests

The test infrastructure deploys three dedicated namespaces with client
and echo-server pods in each: two enrolled namespaces for testing
cross-namespace mTLS scenarios and one unenrolled namespace for baseline
verification. Pod affinity rules ensure echo-same-node pods co-locate
with clients while echo-other-node pods schedule on different nodes,
enabling both intra-node and inter-node traffic validation.

The test scenarios verify ztunnel mTLS behavior by dynamically labeling
namespaces with the io.cilium/mtls-enabled label during test execution.
For enrolled pod pairs, the tests assert that traffic flows through port
15008 (the ztunnel HBONE proxy) and that no unencrypted traffic appears
on port 8080. For unenrolled pods, the assertions are inverted to confirm
traffic bypasses ztunnel entirely. Cross-namespace scenarios confirm that
mTLS works correctly when client and server reside in separately enrolled
namespaces.

Packet capture validation runs from host network pods using tcpdump,
which is required since ztunnel operates in the host network namespace.
The tests query the ztunnel admin API to verify workload registration
before generating traffic, ensuring the any state the test depends on
has converged before making assertions.

Signed-off-by: Louis DeLosSantos <[email protected]>
Signed-off-by: Quang Nguyen <[email protected]>
Signed-off-by: Robin Gögge <[email protected]>

* ci,ztunnel: add workflows for ztunnel encryption tests

Add a new GitHub Actions workflow to run end-to-end tests for ztunnel
encryption in Cilium.

The new /ci-ztunnel-e2e trigger is added to the Ariane configuration,
pointing to the newly created conformance-ztunnel-e2e.yaml workflow
file.

Signed-off-by: Louis DeLosSantos <[email protected]>

* ci,ztunnel: add ztunnel cert script to actions

Signed-off-by: Louis DeLosSantos <[email protected]>

* datapath: remove GetRoutePostEncryptMTU()

The last user was removed by
f81d8964ce4a ("ipsec: don't reduce post-encrypt MTU for tunnel overhead").

Signed-off-by: Julian Wiedmann <[email protected]>

* datapath: ipsec: remove clean up code for encrypt IP rule

https://github.com/cilium/cilium/pull/41699 stopped installing this IP
rule in the v1.19 release. With v1.20 we can now stop cleaning it up.

Signed-off-by: Julian Wiedmann <[email protected]>

* loadbalancer/api: include proxy-redirect as backend

Currently, listing services via `cilium-dbg service list` displays
no backends for the services that use L7 proxy redirection (e.g.
Gateway API / Ingress loadbalancer service).

```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID   Frontend                    Service Type   Backend
1    10.96.62.28:80/TCP          ClusterIP      1 => 10.244.1.149:4245/TCP (active)
2    10.96.0.10:53/TCP           ClusterIP      1 => 10.244.1.69:53/TCP (active)
                                                2 => 10.244.1.109:53/TCP (active)
3    10.96.0.10:53/UDP           ClusterIP      1 => 10.244.1.69:53/UDP (active)
                                                2 => 10.244.1.109:53/UDP (active)
4    10.96.0.10:9153/TCP         ClusterIP      1 => 10.244.1.69:9153/TCP (active)
                                                2 => 10.244.1.109:9153/TCP (active)
5    10.96.0.1:443/TCP           ClusterIP      1 => 172.19.0.2:6443/TCP (active)
12   10.96.162.7:443/TCP         ClusterIP      1 => 172.19.0.2:4244/TCP (active)
15   10.96.106.135:80/TCP        ClusterIP      1 => 10.244.1.115:80/TCP (active)
16   10.96.55.103:80/TCP         ClusterIP      1 => 10.244.1.67:80/TCP (active)
17   [::]:30965/TCP              NodePort
19   [::]:30965/TCP/i            NodePort
21   0.0.0.0:30965/TCP           NodePort
23   0.0.0.0:30965/TCP/i         NodePort
25   10.96.245.249:80/TCP        ClusterIP
26   172.19.255.1:80/TCP         LoadBalancer
27   172.19.255.1:80/TCP/i       LoadBalancer
28   [fd00:10:96::d99f]:80/TCP   ClusterIP
```

Even though the actual redirection happens via BPF/iptables TPROXY, it would be
nice to display the information about the local redirection to the node-local
L7 proxy (Envoy) in some form.

Therefore, this commit changes the frontend model generation to treat the
proxy redirection as backend of the service frontend in the form
`<localhost-addr>:<proxy-port>/<proto> (active)`

Result

```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID   Frontend                    Service Type   Backend
1    10.96.62.28:80/TCP          ClusterIP      1 => 10.244.1.149:4245/TCP (active)
2    10.96.0.10:53/TCP           ClusterIP      1 => 10.244.1.69:53/TCP (active)
                                                2 => 10.244.1.109:53/TCP (active)
3    10.96.0.10:53/UDP           ClusterIP      1 => 10.244.1.69:53/UDP (active)
                                                2 => 10.244.1.109:53/UDP (active)
4    10.96.0.10:9153/TCP         ClusterIP      1 => 10.244.1.69:9153/TCP (active)
                                                2 => 10.244.1.109:9153/TCP (active)
5    10.96.0.1:443/TCP           ClusterIP      1 => 172.19.0.2:6443/TCP (active)
12   10.96.162.7:443/TCP         ClusterIP      1 => 172.19.0.2:4244/TCP (active)
15   10.96.106.135:80/TCP        ClusterIP      1 => 10.244.1.115:80/TCP (active)
16   10.96.55.103:80/TCP         ClusterIP      1 => 10.244.1.67:80/TCP (active)
17   [::]:30965/TCP              NodePort       1 => [::1]:14543/TCP (active)
19   [::]:30965/TCP/i            NodePort       1 => [::1]:14543/TCP (active)
21   0.0.0.0:30965/TCP           NodePort       1 => 127.0.0.1:14543/TCP (active)
23   0.0.0.0:30965/TCP/i         NodePort       1 => 127.0.0.1:14543/TCP (active)
25   10.96.245.249:80/TCP        ClusterIP      1 => 127.0.0.1:14543/TCP (active)
26   172.19.255.1:80/TCP         LoadBalancer   1 => 127.0.0.1:14543/TCP (active)
27   172.19.255.1:80/TCP/i       LoadBalancer   1 => 127.0.0.1:14543/TCP (active)
28   [fd00:10:96::d99f]:80/TCP   ClusterIP      1 => [::1]:14543/TCP (active)
```

Signed-off-by: Marco Hofstetter <[email protected]>

* datapath/ipcachelistener: use injected localnodestore

This commit refactors the datapath ipcache BPF listener to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.

Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).

Signed-off-by: Marco Hofstetter <[email protected]>

* datapath/linuxnodehandler: retrieve node ips from localnodestore

This commit refactors the `linuxNodeHandler` to fetch the node ips
from the injected `LocalNodeStore` instead of using the global
functions `node.GetIP[v4/v6]`.

Signed-off-by: Marco Hofstetter <[email protected]>

* identity/cache: use injected localnodestore

This commit refactors the identity cache to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.

Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).

Signed-off-by: Marco Hofstetter <[email protected]>

* node/address: remove global functions `GetIP[v4/v6]`

This commit removes the unused global functions `GetIPv4` & `GetIPv6`.

Signed-off-by: Marco Hofstetter <[email protected]>

* test: remove K8sDatapathBandwidthTest

The Ginkgo bandwidth test was already not running in CI. It only ran on
net-next kernels, but the focus group that included it
(f10-agent-hubble-bandwidth) was excluded from the net-next job.
Bandwidth enforcement is independently tested in
cilium-cli/connectivity/builder/network_bandwidth_limit.go.

Fixes: #44165
Related: #37837
Signed-off-by: Pavan More <[email protected]>

* wireguard: remove cleanup code for old userspace devices

2578e8ff4dfa ("wireguard: remove deprecated userspace fallback") removed
the userspace mode in v1.17, we can now stop cleaning up any network device
created for that mode.

Signed-off-by: Julian Wiedmann <[email protected]>

* loadbalancer: Check for equality and skip insert when not changed

This adds DeepEqual() implementations for Service and FrontendParams and uses
it to skip inserting the service and updating the frontend if nothing has changed.

As we now don't forcefully update the frontend the orphan detection has to actually
build a set to check if a frontend has been removed.

Benchmarks before:
Benchmark_UpsertServiceAndFrontends_100-6           3549            317691 ns/op            314771 objects/sec
BenchmarkInsertBackend-6                            2818            423975 ns/op            235863 objects/sec
BenchmarkReplaceBackend-6                         326682              3793 ns/op            263669 objects/sec
BenchmarkReplaceService-6                        2327074               509.4 ns/op         1963230 objects/sec

After:
Benchmark_UpsertServiceAndFrontends_100-6                   3464            331791 ns/op            301395 objects/sec
Benchmark_UpsertServiceAndFrontends_100_Unchanged-6        14652             81250 ns/op           1230766 objects/sec
BenchmarkInsertBackend-6                                    2956            401100 ns/op            249315 objects/sec
BenchmarkReplaceBackend-6                                3402430               360.9 ns/op         2771038 objects/sec
BenchmarkReplaceService-6                                2068555               556.6 ns/op         1796743 objects/sec

Signed-off-by: Jussi Maki <[email protected]>

* loadbalancer: Remove dummy ingress endpoint workaround

Remove the skipping of the 192.192.192.192:9999 dummy ingress
endpoint as the next commit will remove the creation of it.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Marco Hofstetter <[email protected]>

* operator/helm: Remove creation of dummy ingress endpoint

With the new agent load-balancer implementation the dummy endpoint
is no longer necessary. Now that v1.18 has been released with that
implementation we can remove the creation of the dummy endpoint
without breaking upgrade.

Note: This commit removes the dummy endpoint in the Operator-
(Gateway API & dedicated Ingress) & Helm- (shared Ingress) managed
`EndpointSlice`'s. The creation of the `EndpointSlice` itself will
will be done in a separate PR.

Fixes: #19262

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Marco Hofstetter <[email protected]>

* monitor: report 3rd argument in DBG_GENERIC debug monitor messages

Currently, in case cilium_dbg3(ctx, DBG_GENERIC, arg1, arg2, arg3) is used in
datapath code, the 3rd argument will not be reported in the monitor
message, e.g.:

    bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac)

Also report the 3rd argument, so the monitor message will look as
follows:

    bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac) arg3=170 (0xaa)

Signed-off-by: Tobias Klauser <[email protected]>

* policy: cleanup label selector parsing and validation

This commit cleans up the parsing and validation of policy label
selectors by removing the dependency on k8s labels library for
validation.

With this change we no longer convert cilium representation
of label keys with source prefixed using ':' delimiter to k8s specific
representation with source prefixed using '.' for validation.
This avoids the additional complexity in policy code to convert between
the two representations. It also simplifies marshalling of selectors as
the objects now have a unified format.

This is not a functional change and does not have any associated user
impact.

Signed-off-by: Deepesh Pathak <[email protected]>

* helm/ztunnel: bind health check to localhost

Security hardening for ztunnel running with hostNetwork: true:

Add host field to readiness probe to bind the health check port 15021
to 127.0.0.1 instead of 0.0.0.0. This reduces attack surface by ensuring
the health check endpoint is only accessible from localhost (kubelet
runs on same node).

Signed-off-by: Quang Nguyen <[email protected]>

* ci:wireguard: enable Host Firewall in native routing e2e tests

This enabled Host Firewall in the `wireguard-3` config.
This helps us validating that host-related packets go through the host
firewall hook in `cilium_host` when WireGuard is enabled in native routing.
In overlay, even if we'd miss a redirect we'd see the packet in bpf_overlay,
which will then redirect the packet to bpf_host for HostFW validation.

Signed-off-by: Simone Magnani <[email protected]>

* mcsapi: Add namespace filtering conditions to ServiceImport controller

Add namespace global status checking to the ServiceImport controller:
- Set Invalid condition on ServiceExport when namespace is not global
- Set NamespaceNotGlobal condition on ServiceImport when namespace is not global
- Signal service controller to skip/delete derived Service for non-global namespaces
  by setting SupportedIPFamilies annotation to empty

This implements the MCS-API portion of the ClusterMesh global namespace
filtering feature as described in CFP-39876.

Signed-off-by: Jacques Massa <[email protected]>

* docs: split up network policy language page

Up until now, the page title had been "Layer 3 Examples", which is
a section headline and confusing, since it is among other examples.
Splitting up into several pages, similar to `network/kubernetes/`,
keeps the ToC as it is, and makes it easier to navigate compared to
the lengthy page it was, while also giving each page a suitable
headline.
Since examples are mixed with the language specification, change
headings from "Layer 3 Examples" to "Layer 3 Policies", etc.
Drop the old page and redirect to the overview to keep links working.

Signed-off-by: Daniel Maslowski <[email protected]>

* golangci-lint: fix and simplify golangci-lint.sh

golangci-lint.sh would always rebuild the binary. The script was over-engineered
and didn't use the `golangci-lint version` subcommand, which doesn't need the
version string parsing logic. Simplify the script and fix the rebuild trigger.

Prepare the directory structure for splitting the main .golangci-lint.yaml in a
subsequent commit.

Signed-off-by: Timo Beckers <[email protected]>

* golangci-lint: split kubeapi configuration into separate file

The addition of kubeapilinter forced all users of the main .golangci.yaml to be
running a custom-built version of the tool and make sure it's kept in sync.

VSCode runs `golangci-lint run` in the repo by default and requires explicit
configuration to use another tool, like a script in the Cilium repository.
Before this PR, the script was broken and would always rebuilt custom-gcl.

Breaking the default tooling is not great. This commit moves kubeapi-specific
configuration to a dedicated config file in tools/golangci-lint-kubeapi and
specifies it explicitly when starting the kubeapi linter.

Signed-off-by: Timo Beckers <[email protected]>

* policy: Import the ClusterNetworkPolicy v1alpha2 API go modules

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add parser function for Cluster Network Policy

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add flags to enable or disable Cluster Network Policy

Disabled by default. A new Makefile target is added that enables it in kind clusters.

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add watcher for Cluster Network Policy

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Move NetworkPolicy and CNP/CCNP to the 'Normal' tier

Signed-off-by: Blaz Zupan <[email protected]>

* CODEOWNERS: re-assign operator/identitygc, adjust endpoint team description

Assign operator/identitygc to @cilium/sig-policy instead of
@cilium/endpoint based on Joe's feedback. This seems like the more
appropriate team based on the description.

Also update the description for the @cilium/endpoint team to cover
responsibilities for cluster-wide/k8s considerations and endpoint
lifecycle.

Follow-up to commit 11376d5c715b ("CODEOWNERS: add more specific owners
for operator subsystems").

Ref. https://github.com/cilium/cilium/pull/44279#pullrequestreview-3780398303

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>

* nodemap: converted net.IP to netip.Addr, Part of #24246

- Converted net.IP to netip.Addr in node map functions
- Added validation for zero netip.Addr values in newNodeKey
- Wrapped parse errors with %w for better error context

Signed-off-by: Sanjeevliv <[email protected]>

* bpf/tests: fix byte ordering for for TCP seq/win values

The pktgen logic for BPF tests sets default values on TCP packets in
host byte order only. This feeds into the TCP checksum calculation
and while these inputs are technically incorrect, tests that assert
the TCP checksum are correctly testing the invalid output values.

With the introduction of scapy packet definitions, matching these values
to existing tests is difficult because scapy does convert seq and win
values into network byte order. The effect is that L4 sequence numbers
between pktgen and scapy are wrong.

This commit updates pktgen and existing tests that assert these values
to use the appropriate byte conversion routines so that both pktgen
and scapy definitions produce the same results.

This causes all affected BPF tests to fail. This will be addressed
in the next commit.

Signed-off-by: Alasdair McWilliam <[email protected]>

* bpf/tests: fix TCP checksum assertions in all tests

This commit updates every test that asserts TCP checksums to reflect new
values following the change to the default TCP seq/win values, such that
the assertions for this should succeed with a scapy packet definition.

As part of this change, test failure logs have been updated to correctly
show value observed in a packet, and the expected value, both in network
byte order. This is to simplify future debugging. This has also been
applied to UDP checksum assertions for consistency.

Signed-off-by: Alasdair McWilliam <[email protected]>

* bpf/tests: fix default_data definition for scapy tests

The pktgen code defines default_data macro as a string literal which
includes a NUL-terminating character. The size of this string is 20
bytes.

The scapy code defines the same value without the NUL-terminating
character. The size of this string is 19 bytes.

As a result, when converting a test from pktgen to scapy, the packet
length changes, which causes assertion failures beause the change leads
to different L3 checksum values.

This commit modifies the scapy default_data variable to include a NUL
terminating character to match pktgen definitions. It is anticipated
this can be removed once all BPF tests are migrated to scapy.

Signed-off-by: Alasdair McWilliam <[email protected]>

* endpoint, fqdn: remove restoration of deprecated V1 DNSRules

Now that 1.16 is EOL, the code restoring PortProto V1 (without protocol)
can be removed.

Signed-off-by: Tobias Klauser <[email protected]>

* endpoint: rename DNS rules field

The V2 suffix is unnecessary now that we stop restoring V1 rules and
there is only a single supported version/field.

Also add a test that verifies that V2 rules are restored and V1 rules
are ignored while retaining backwards compatibility.

Signed-off-by: Tobias Klauser <[email protected]>

* tests: Ignore identity manager related error in versions < 1.18

Due to https://github.com/cilium/cilium/pull/42661 and
https://github.com/cilium/cilium/pull/42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
https://github.com/cilium/cilium/pull/42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>

* metrics: remove agent bootstrap metrics

This commit removes the deprecated agent bootstrap metrics.

Hive job metrics can be used to inspect the duration of the legacy
daemon initialization job. In special cases it might be worth to
introduce context specific metrics - similar to what has been
introduced for the endpoint restoration logic.

Signed-off-by: Marco Hofstetter <[email protected]>

* policy: fix policy tests

This fixes a policy break due to how label source is handled that
recently changed.

Fixes: 270daa4067 ("policy: Add parser function for Cluster Network Policy")
Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Odin Ugedal <[email protected]>

* resource/test: let TestResource_WithFakeClient set resource version

Update the TestResource_WithFakeClient test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* cid/test: let TestUpdatePodLabels set resource version

Update the TestUpdatePodLabels test to correctly specify the expected
resource version during updates, in preparation for extending the fake
client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* ipam/test: let Test_LocalNodeCIDRsSyncer set resource version

Update the Test_LocalNodeCIDRsSyncer test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* bgp/test: correctly set resource version when updating test resources

Update the bgp tests to correctly specify the expected resource version
during updates, in preparation for extending the fake client to actually
enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* test/controlplane: adaptation for optimistic concurrency control

Update the UpdateObjects helper to use the [ObjectTracker.Patch],
instead of [ObjectTracker.Update], in preparation for the subsequent
commit that will make the latter implement optimistic concurrency
control, and validate resource version mismatches, which is not
required in this context.

Signed-off-by: Marco Iorio <[email protected]>

* k8s/client/fake: fix resource version configuration in tracker

Currently, the object tracker is affected by a bug that causes the
resource version to not be set on creation or update if the object
does not have the [metav1.TypeMeta] set. Indeed, in that case, the
function updating the TypeMeta creates a deep copy of the object,
causing operations performed via [meta.Accessor] to act on the old
copy, and not have effect. Let's get this fixed by changing the
[fillTypeMetaIfNeeded] function to not create a deep copy, given
that it already operates on a copy of the original object.

Signed-off-by: Marco Iorio <[email protected]>

* k8s/client/fake: let update operations respect resource versioning

Currently, the statedb object tracker backing the fake kubernetes client
used for testing purposes does not respect resource versioning, and allows
update operations to succeed regardless of the provided resource version.
While convenient for the `k8s/update` command itself, this approach is
problematic in case of controllers acting on the same resources, as it
can lead to objects being unexpectedly reverted to incorrect versions,
due to the missing optimistic concurrency control.

Let's get this fixed by extending the update implementation to additionally
compare the resource version of the stored and provided objects, and reject
the update in case they do not match, as the real Kubernetes API Server
would do. By default, the k8s/update command still ignores the provided
resource version, letting the update succeed regardless: this matches the
desired behavior in the vast majority of the tests, and avoids the need
for complex operations to set the expected resource version. Still, if
necessary, the stricter behavior can be enabled via the dedicated flag.

Signed-off-by: Marco Iorio <[email protected]>

* chore(deps): update base-images to v1.26.0

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* chore(deps): update cilium/cilium-cli action to v0.19.1

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.6.8

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all-dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.gi…
jiashengz pushed a commit to Roblox/cilium that referenced this pull request Feb 23, 2026
* ci: e2e: enhance readability of workflow job name

This commit updates the name of the "Setup & Test" job in the
GitHub Actions workflow for e2e upgrade tests to include only the matrix
parameters "name" and "mode". This change improves the readability
of the workflow runs by providing more context about the specific
configuration being tested.

Prior to this, the name of each job contained the whole matrix combination,
which in the UI resulted to be cut off and not readable. Given that now
we use the same workflow file for running both `minor` and `patch` upgrades,
let's make the displayed name simpler.

The result will be `Setup & Test (ipsec-1, minor)`.

Signed-off-by: Simone Magnani <[email protected]>

* ci: e2e: log matrix configuration in each job

This commits adds as a first step of the `Setup & Test` job for e2e-upgrade
a simple step to dump the current matrix configuration being tested.

The previous commit, modified the title to simply display the matrix entry
name and mode (e.g., `Setup & Test (ipsec-1, minor)`) rather than the
whole configuration. In UI, that would result to be truncated anyway.

It is true that, given the matrix.name (e.g., ipsec-1), a user can open the
specific file and lookup the configuration required, but I think that
having a step where we dump it would speed up and easy debuggability in CI.

The output would be similar to:

```
> Log Matrix Configuration
Current matrix configuration:
{
  "name": "wireguard-1",
  "kernel": "5.10",
  "kube-proxy": "iptables",
  "kpr": "true",
  "devices": "{eth0,eth1}",
  "secondary-network": "true",
  "tunnel": "vxlan",
  "encryption": "wireguard",
  "encryption-node": "false",
  "lb-mode": "snat",
  "endpoint-routes": "true",
  "egress-gateway": "true",
  "ingress-controller": "true",
  "mode": "minor"
}
```

Signed-off-by: Simone Magnani <[email protected]>

* endpoint: Log labels as structured JSON objects

Standardize logging in pkg/endpoint so that identityLabels and related fields are logged as structured JSON objects instead of comma-separated strings by explicitly casting labels.Labels to map[string]labels.Label.

Signed-off-by: Jie WU <[email protected]>

```release-note
endpoint: Log labels as structured JSON objects
```

* feat(helm): hubble-ui containers set to pss-restricted

This sets the hubble-ui pods/containers to match k8s
pss-restricted profile along with the optional
`readOnlyRootFilesystem: true`

Signed-off-by: Pat Riehecky <[email protected]>

* helm: allow multicluster-services installCRDs to update CRDs

Previously cilium-operator fails to start if MCS/installCRDs is enabled
because it does not have permissions to update the CRD with this log
message:

level=error msg="Unable to update CRD"
module=operator.operator-controlplane.leader-lifecycle.create-crds
name=serviceimports.multicluster.x-k8s.io
error="customresourcedefinitions.apiextensions.k8s.io
\"serviceimports.multicluster.x-k8s.io\" is forbidden: User
\"system:serviceaccount:kube-system:cilium-operator\" cannot update
resource \"customresourcedefinitions\" in API group
\"apiextensions.k8s.io\" at the cluster scope"

This patch adds the necessary permissions to cilium-operator if you have
mcs/installCRDs enabled

Fixes: #44210
Fixes: 3874013329d0 ("clustermesh: add config for auto installing
MCS-API CRDs")

Signed-off-by: Florian Ströger <[email protected]>

* policy: (mechanical) refactor out flow lookup types

A subsequent commit will include an alternate policy iteration system,
so it will be nice to move the types to policy/types.

This also removes the now-useless Decision type, as it's not used
anywhere in the codebase.

Signed-off-by: Casey Callendrello <[email protected]>

* policy: add a simple iterative policy simulator

This is a simple userspace tool that executes rules step-by-step. It's
purpose will be to validate more complex policy scenarios, ideally by
fuzzing.

To ensure it's output matches that of the existing policy engine, it
matches the LookupFlow method signature, and existing tests validate
that the simulation engine returns the same verdict.

Signed-off-by: Casey Callendrello <[email protected]>

* Add fuzz-based policy testing

This generates random policy corpuses and compares MapState-based policy
calculation with the iterative simulator.

Signed-off-by: Casey Callendrello <[email protected]>

* policy: Fix fuzz testing

Avoid using *testing.F for the logger as then any log within the fuzz
test would fail.

Fix the order of expected and actual for require.Equal.

Add more debugging.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Hide precedence details better

Hide precedence details from the policymap package.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add optional indexing by identity

Add optional mapState indexing by identity to support incremental removal
of generated keys. This is only needed for deletion pass entries, so the
index is only used if the policy has pass verdicts.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add default deny rule when pass verdict is used

Proper processing of pass verdicts requires the default deny rule to be
explicitly added to the mapstate so that it can be seen by pass verdict
entries.

The default rule is added to the next tier if any non-default tiers or
priorities are in use, of if the traffic direction has any pass
rules. This way the pass rule can pass to the added default deny (or
allow) rule.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Fix L7Filter precedence handling for pass verdicts

Deny takes precedence over allow and pass, allow takes precedence over
pass. Define new HasPrecedenceOver() to handle this instead of using just
IsDeny() like before.  Would be simpler if Allow was not the zero value,
but changing that would require changing all unit testing code that uses
it as the default.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Fix per-tier priority range allocation

Fix tier base priority calculation. When figuring out the priority range
for each tier, the full range of the remaining tiers must be included to
add enough space for pass verdicts on higher tiers. Then, when setting
the base priotity of each tier, this has to be reversed.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add Fuzz cases

Commit fuzzer cases found during development.

Signed-off-by: Jarno Rajahalme <[email protected]>

* policy: Add generated L3/4 entries, multipass support

A pass of a specific identity to a lower tier rule with wildcard identity
should pass the given identity only and keep the wildcard entry at the
original precedence to take care of traffic with other identities. Since
the original entry needs to be kept, a new generated entry with the
identity from the pass entry and the L4 from the passed to entry must be
added.

We missed this case earlier due to BroaderOrEqualKeys only iterating
wildcard identity entries when the new key is a wildcard entry. Entries
that have a broader or equal L4 but more specific L3 are not as a whole
"broader or equal". To handle the need for generated entries for the pass
verdict processing "BroaderOrEqualKeys" is changed to also iterate all
specific L3 keys if the L4 is broader or equal and the given key has the
wildcard identity. The old behavior is retained with
CoveringBroaderOrEqualKeys(). Similarly, NarrowerOrEqualKeys() is renamed
as CoveringNarrowerOrEqualKeys() while NarrowerOrEqualKeys() now also
iterates keys with the wildcard identity when the given key has a
specific identity.

The addition of generated entries requires these entries to be deleted
when that identity is incrementally deleted. Since selector cache is
transactional we can delete all keys with the deleted identity, when the
first key with that identity is deleted. To make this efficient we use
the new id index.

To add support for pass verdicts at multiple tiers, the pass metadata is
now stored as a slice. Overhead to non-pass entries is reduced by storing
the slice via a pointer ('passes'), as most mapStateEntries would not
have any pass metadata.

If 'passes' is non-nil, then the pointed-to slice must have at least
one element, and all elements must have non-zero 'passPrecedence'.

When merging pass metadata we clone the slice to be mutated so that the
same slice can safely be used in multiple entries.

Split insertWithPasses() from insertWithChanges(); insertWithPasses() is
only calling it if the policy has any pass verdicts. This reduces the
chance of regressions for non-pass policies.

Log a warning if a policy with pass verdicts is also using auth
requirements, as this combination has not been implemented. Adjust a test
to not claim all features when that is not the case.

Signed-off-by: Jarno Rajahalme <[email protected]>

* pkg/subnet: Fix tag in config subnets field

The Subnets field in the config was declaring a json tag, leading to a
failure of the agent `hive` command (see below). This is due to the fact
that the hive relies on a mapstructure Decoder, not a Json one, and
therefore require a mapstructure tag when the config field name is not
equal to the flag name.

Fix the tag on the field using a mapstructure one.

```
make -C daemon/ && ./daemon/cilium-agent hive
...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xacd46e]

goroutine 1 [running]:
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000b540f0, 0xa?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x12e
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a9bdd0, 0x6?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive/cell.(*InfoNode).Print(0xc000a799b0, 0x2?, 0xc000cba750)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/cell/info.go:109 +0x134
github.com/cilium/hive.(*Hive).PrintObjects(0xc0009554a0, {0x51e1240, 0xc0000c0030}, 0x0?)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/hive.go:459 +0x18f
github.com/cilium/hive.(*Hive).Command.func1(0xc000e0fc00?, {0x4b05b96?, 0x4?, 0x4b05aca?})
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/cilium/hive/command.go:21 +0x2d
github.com/spf13/cobra.(*Command).execute(0xc000953508, {0x86bac20, 0x0, 0x0})
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1015 +0xb02
github.com/spf13/cobra.(*Command).ExecuteC(0xc000952f08)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1148 +0x465
github.com/spf13/cobra.(*Command).Execute(...)
	/home/ffalzoi/cilium/cilium-2/vendor/github.com/spf13/cobra/command.go:1071
github.com/cilium/cilium/daemon/cmd.Execute(0x4d769d0?)
	/home/ffalzoi/cilium/cilium-2/daemon/cmd/root.go:89 +0x13
main.main()
	/home/ffalzoi/cilium/cilium-2/daemon/main.go:15 +0x1f
```

Fixes: d395d73ad3 ("pkg/subnet: Add subnet config watcher and manager")
Signed-off-by: Fabio Falzoi <[email protected]>

* linux-desired-device: introducing Cilium managed devices reconciler

Adding new reconciler in Cilium datapath/linux which can manage
life cycle of linux links created by Cilium.

Created devices are persisted on disk using write-ahead-log, upon
restart the owners of the devices are expected to redo the
configuration before calling finializer. Stale devices will be pruned.

Implementation is inspired by linux/route/reconciler.

Signed-off-by: harsimran pabla <[email protected]>

* linux-desired-device: script tests for desired-devices

Adding script tests to validate device creation and persistence.

Signed-off-by: harsimran pabla <[email protected]>

* linux-desired-device: hook desired-devices cell into main infra

Signed-off-by: harsimran pabla <[email protected]>

* fix: helm intervalSeconds value render bug

intervalSeconds is always of an integral type, no need to check kindIs float64

Fixes: #44206
Signed-off-by: jayl1e <[email protected]>

* bpf: source tuple hash seeds from node config

Move IPv4/IPv6 hash init seeds into node config and wire them from
Maglev config. BPF tuple hashing now reads CONFIG(hash_init{4,6}_seed)
instead of compile-time defines, and the legacy HASH_INIT* defines are
removed from the header writer and node_config.h.

Signed-off-by: viktor-kurchenko <[email protected]>

* feat(install) Allow hubble to run with hostUsers: false

The hubble components do not require direct mapping of container
users to system users.

Signed-off-by: Pat Riehecky <[email protected]>

* docs: fix typos in comments

Signed-off-by: Yohei Yamamoto <[email protected]>

* bpf: correct comments in cil_from_netdev function

Removed conditions from the comment block describing the cil_from_netdev function since the logic has been changed

Signed-off-by: Liyi Huang <[email protected]>

* chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all-dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* bpf: lb: Decouple DNAT operation from LB key

Instead of passing the lb4_key/lb6_key to lb4_xlate/lb6_xlate for
checksum calculation and port translation, pass the original destination
address and port directly from the CT tuple.

This change:
1. Removes the key parameter from lb4_xlate/lb6_xlate functions
2. Removes the key parameter from lb4_dnat_request/lb6_dnat_request

The CT tuple already contains the same values that were being read from
the key structure:
- tuple->daddr == key->address (original destination)
- tuple->sport == key->dport (reversed port order in CT tuple)

By removing the xlate path's dependency on key, we can now directly
modify key->address = 0 for wildcard lookups without creating a copy,
simplifying the backend selection logic.

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: lxc: Handle DSR for remote NodePort services on source node

When DSR and PER_PACKET_LB are enabled, connections fail if a client pod
sends a request to a remote node's NodePort service while the server
pod is located on the same node as the client.

The root cause is that the remote node only performs DNAT, setting the
packet's source address to client's node address, leading to a hairpin
problem. Consequently, the originating node cannot perform the necessary
REV NAT for the reply packets.

To resolve this, remote NodePort service requests are now handled on the
source node when DSR is enabled, similar to the behavior of socket-level
load balancing.

Implementation details:

- Add lb4/6_lookup_wildcard_nodeport_service(): When ENABLE_DSR is
  defined and the regular service lookup fails, check if the destination
  is a remote node's IP with a NodePort port range. If so, perform a
  wildcard lookup (address=0) to find the NodePort service.

- Use wildcard key for backend selection: When dsr_internal flag is set,
  set key->address to 0 before calling lb4/6_select_backend_id(). This
  applies to both CT_NEW (new connections) and CT_REPLY (backend
  re-selection for existing connections). This is needed for backend
  selection algorithms that use slot lookup (e.g., Random), which look
  up backend slots via lb4/6_lookup_backend_slot() using the service
  key. Without a wildcard key, the lookup would fail because backend
  slot entries are stored with the wildcard service key, not with the
  remote node's IP.

- Store original destination in CT entry: The original destination
  address and port (remote node IP and NodePort) are stored in
  ct_state_new.nat_addr/nat_port, which will be written to the CT entry
  for use in reply path RevNAT processing.

- Use cilium_dsr_nat_buffer per-CPU map: The NAT info is detected in
  __per_packet_lb_svc_xlate_4/6(), but the CT entry is created after DNAT
  when the original destination info is no longer available in the packet.
  The per-CPU buffer preserves this info across the DNAT operation.

Existing connection handling:

- This change only affects DSR traffic destined to remote node's
  NodePort. The wildcard lookup is triggered when lb4/6_lookup_service()
  fails, but it only processes packets where the destination is a remote
  node IP with a port in the NodePort range. Other traffic that fails
  the regular lookup is unaffected.

Fixes: #41962

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: lxc: Add RevNAT support for DSR remote NodePort connections

Add reverse NAT support for reply packets of DSR remote NodePort
connections. The forward path stores the original destination address
and port in the CT entry's nat_addr/nat_port fields, which are now
used during reply processing.

Implementation details:

- Extend lb4/6_rev_nat() signature: Add nat_addr and nat_port parameters.
  When nat_port is non-zero, use these values directly for RevNAT.
  When nat_port is zero, fall back to the existing rev_nat_index lookup
  for backward compatibility with existing connections.

- Modify ct_lookup_fill_state(): Copy nat_addr and nat_port from the
  CT entry to ct_state, making them available for reply processing.

- Update ipv4/6_policy(): Check for nat_port in addition to
  rev_nat_index when deciding whether to perform RevNAT. Pass the
  CT entry's NAT information to lb4/6_rev_nat().

- Update nodeport_rev_dnat_ipv4/6(): Adapt to the new lb4/6_rev_nat()
  signature by passing NULL/0 for nat_addr/nat_port (these paths use
  the traditional rev_nat_index lookup).

Existing connection handling:

1. New connections (created after this patch):
   - Forward path stores nat_addr/nat_port in CT entry
   - Reply path uses CT entry's nat_addr/nat_port for RevNAT

2. Existing connections (regular NodePort/DSR traffic):
   - CT entry has nat_addr=0, nat_port=0
   - lb4/6_rev_nat checks nat_port first:
     - If nat_port != 0: use nat_addr/nat_port directly
     - If nat_port == 0: fall back to rev_nat_index lookup
   - This ensures existing connections continue to work

3. Upgrade scenario:
   - Existing connections keep working via rev_nat_index fallback
   - New DSR remote nodeport connections use nat_addr/nat_port
   - No connection disruption during rolling upgrade

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* bpf: tests: Add tests for DSR remote NodePort handling

Add BPF unit tests to verify the DSR remote NodePort functionality
for both IPv4 and IPv6.

Test scenarios for DSR mode (tc_lxc_lb4/6_dsr_nodeport.c):

1. Pod -> Remote NodePort -> Local backend (forward path)
   - Client pod sends packet to remote node's NodePort
   - LB selects a backend on the local node (same node as client)
   - Verifies packet is DNATed to local backend IP and port
   - Verifies CT entry contains correct nat_addr (remote node IP)
     and nat_port (NodePort)

2. Local backend -> Pod (reply path)
   - Backend sends reply packet to client
   - Verifies RevNAT is applied correctly
   - Source IP/port changed to remote node IP and NodePort

3. Pod -> Remote NodePort -> Self (hairpin)
   - Client pod sends packet to remote node's NodePort
   - LB selects the client pod itself as the backend
   - Verifies DNAT to client IP and backend port
   - Verifies SNAT to loopback IP for hairpin flow

4. Hairpin reply
   - Pod replies to loopback IP
   - Verifies RevNAT restores remote node IP and NodePort

5. Existing connection handling (UDP)
   - First packet establishes CT entry via legacy path
   - Second packet should use existing CT entry
   - Verifies wildcard lookup is skipped for existing connections

Test scenarios for Hybrid mode (tc_lxc_lb4/6_hybrid_dsr_nodeport.c):

1. DSR service handling
   - Verifies DSR-enabled service triggers wildcard lookup
   - Packet is DNATed to local backend

2. SNAT service handling
   - Verifies SNAT service does NOT trigger wildcard lookup
   - Packet passes through without DNAT (handled by remote node)

Co-authored-by: Siwan Kim <[email protected]>
Signed-off-by: Gyutae Bae <[email protected]>

* chore(deps): update quay.io/cilium/cilium-envoy docker tag to v1.36.5-1770440937-14da6b9c8a54244f0a67cd90a0deb83e5f110a4a

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update module sigs.k8s.io/kube-api-linter to v0.0.0-20260206102632-39e3d06a2850

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update base-images

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* loadbalancer/maps: let SourceRangeKey.GetCIDR return netip.Prefix

Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.

This slightly improves pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service:  49 objs, 10042B alloc,  2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service:  59 objs, 10272B alloc,  3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service:  74 objs, 11128B alloc,  4662B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

Signed-off-by: Tobias Klauser <[email protected]>

* loadbalancer/maps: let ServiceKey.GetAddress return netip.Addr

This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.

pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects /  99675kB still reachable (per service:  38 objs,  9974B alloc,  2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service:  56 objs, 10212B alloc,  3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service:  74 objs, 11116B alloc,  4662B in-use)

Signed-off-by: Tobias Klauser <[email protected]>

* loadbalancer/maps: remove unused SkipLBMap delete methods

The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f8129d1a ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.

Signed-off-by: Tobias Klauser <[email protected]>

* fix(deps): update all go dependencies main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* docs: Update docsearch to v4.5.4

Pull in the latest theme with newer docsearch plugin version.

Signed-off-by: Joe Stringer <[email protected]>

* ci: update docs-builder

Signed-off-by: Cilium Imagebot <[email protected]>

* Use binary.NativeEndian instead of nl.NativeEndian

Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.

While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.

Follow-up to commit 2c1b49cac70b ("byteorder: use binary.NativeEndian")

Signed-off-by: Tobias Klauser <[email protected]>

* datapath: fix panic during datapath reinitialization

This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.

```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0

goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
        /go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```

With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.

Fixes 8fae439710fd4b426beae9190957d2380a96bed6 ("datapath: move DirectRoutingDevice validation to orchestrator")

Signed-off-by: Deepesh Pathak <[email protected]>

* datapath/loader: Add netkit to BPF load tests

Commit 854473726b ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.

This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.

Signed-off-by: Alasdair McWilliam <[email protected]>

* docs: add netkit requirement to kernel version list

Add Linux kernel requirement for netkit to the System Requirements.

Signed-off-by: Alasdair McWilliam <[email protected]>

* style(bpf/test): fix indentation

Signed-off-by: Andrea Terzolo <[email protected]>

* reafctor(bpf): move `icmp_wsum_accumulate` helper

This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.

Signed-off-by: Andrea Terzolo <[email protected]>

* refactor(bpf): move ICMPv6 packet generation to a separate file

The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file

Signed-off-by: Andrea Terzolo <[email protected]>

* refactor(bpf): reduce ifdef number

Signed-off-by: Andrea Terzolo <[email protected]>

* gateway-api: Update conformance test Make target

This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.

Signed-off-by: Nick Young <[email protected]>

* bpf: introduce DECLARE_CONFIG_KIND

DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.

Signed-off-by: Tobias Klauser <[email protected]>

* bpf: wire events map rate limits through node config

Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.

Signed-off-by: viktor-kurchenko <[email protected]>

* sysdump: Use label selectors for Hubble UI/Relay deployment collection

Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.

This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.

Fixes the issue where:
  cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.

Signed-off-by: darox <[email protected]>

* bpf: lxc: remove unnecessary L3 validation

There's no code that uses the IPv4 header afterwards.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf: lxc: fine-tune BPF Host Routing path

Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).

By structuring the code as a switch() statement we can also clean up one
of the goto paths.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf: xdp: prefer CTX_ACT_TX over XDP_TX

Return the generic value, so that readers understand what macro they should
be using when handling the result.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf, nat46x64: move RFC6052 prefix into node config

This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.

Updates included:

- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
  dropped.

Signed-off-by: viktor-kurchenko <[email protected]>

* neighbor: Fix description for L2 neighbor discovery

The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.

Co-authored-by: Dylan Reimerink <[email protected]>
Signed-off-by: Dylan Reimerink <[email protected]>
Signed-off-by: Paul Chaignon <[email protected]>

* CODEOWNERS: add more specific owners for operator subsystems

Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.

Signed-off-by: Tobias Klauser <[email protected]>

* .github/workflows: eks-cluster-pool-manager: fix race condition and cleanup leaked IAM roles

When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.

This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails

Signed-off-by: André Martins <[email protected]>

* hubble: Fix typos in config/set.go

Signed-off-by: harshitghagre <[email protected]>

* test/helpers: ignore error creating lease lock message

This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.

Signed-off-by: André Martins <[email protected]>

* Fix backend slot index mismatch in LB reconciler

Use slotID instead of loop index when setting backend slots to avoid
gaps when maintenance backends are skipped.

Signed-off-by: Aman-Cool <[email protected]>

* vendor: Bump to StateDB v0.6.3

This restores the ability to use Changes() against a WriteTxn that is
targeting the same table as Changes().

Signed-off-by: Jussi Maki <[email protected]>

* docs: Fix upgrade note category for tproxy

There's no new option here, it's a change in behaviour. Fix the category
for the upgrade note.

CC: Alasdair McWilliam <[email protected]>
Fixes: c257b3d0aca8 ("datapath/connector: do not support netkit and bpf.tproxy=true")
Signed-off-by: Joe Stringer <[email protected]>

* policy: Fix PASS verdict for non-consecutive tiers

Signed-off-by: Blaz Zupan <[email protected]>

* loadbalancer/healthserver: refresh ProxyRedirect per request

This commit fixes stale ProxyRedirect reads in the health server by reloading Service
state from the services table on each request. This prevents incorrect
local endpoint counts when Envoy redirect state changes after the
listener is created (which is the case).

Signed-off-by: Marco Hofstetter <[email protected]>

* localnodestore: provide WaitForNodeInformation

This commit moves global function `k8s.WaitForNodeInformation`
into the `localNodeSynchronizer`. To prevent dependencies to the
synchronizer, the `LocalNodeStore` re-exposes the same method and
delegates the call to the synchronizer.

This way, the legacy daemon initialization logic no longer needs to
dependent on the (k8s/cilium)-node resources.

This step isn't final. Eventually, the wait logic should watch
the state db "local node" for the changes.

Signed-off-by: Marco Hofstetter <[email protected]>

* localnodestore: pass localnodestore to synchronizer

With this commit, the `LocalNodeStore` passes a pointer
to itself to the synchronizer when calling `WaitForNodeInformation`.

This way, the synchronizer can update the ip allocation ranges without
using the global functions.

Signed-off-by: Marco Hofstetter <[email protected]>

* ci: e2e: add `kernel` to workflow job names

As a follow-up of changes introduced in https://github.com/cilium/cilium/pull/44126
to improve the readability of workflow job names in the GitHub UI, this
commit adds the `kernel` parameter to the name too. In OpenSearch,
it would be good to differentiate these jobs by the used kernel.
Given the kernel is a parameter that may vary frequently, it is good to be
able to differentiate runs that used the same configuration but different
kernels. That's also good in case of regressions when changing kernel.

The result will be `Setup & Test (ipsec-1, minor, 5.10)`.

Signed-off-by: Simone Magnani <[email protected]>

* pkg/datapath/bandwidth: optimize host endpoint QoS setup

The bandwidth manager was calling node.GetEndpointID() and performing
a table lookup on every UpdateBandwidthLimit() call, even though the
host endpoint QoS only needs to be set up once.

This commit introduces an atomic boolean flag that tracks whether the
host endpoint QoS has been configured. After the initial setup:
- Skip the GetEndpointID() call (fast path)
- Skip the Table.Get() call entirely
- Skip the redundant Insert() call for host endpoint

Additionally, based on review feedback:
- Change node.GetEndpointID() to return (uint64, bool) to indicate
  whether the host endpoint ID has been set, avoiding duplicate
  constants across packages
- Change node.endpointID to atomic.Uint64 for thread-safe access
  during initialization

Signed-off-by: Anand Kumar Shaw <[email protected]>

* clustermesh: fix a few misc issue with MCS-API doc

This commit fixes two issue with mcs-api doc:
- a simple typo on the enabled keywork
- change the code-block in parsed-literal as the |SCM_WEB| "variable"
  was not evaluated/replaced in the final doc with a code-block

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* Docs: improve docs around ipsec upgrade in 1.18

Signed-off-by: darox <[email protected]>

* docs(ztunnel): fix duplicate word (a set)

Signed-off-by: Alexis La Goutte <[email protected]>

* docs(ztunnel): add missing backslash

add missing backslash for install with Cilium CLI

Signed-off-by: Alexis La Goutte <[email protected]>

* clustermesh: helm: remove clustermesh.enableMCSAPISupport

This option was deprecated and its removal was announced for Cilium
1.20, so let's drop it now that we are in the 1.20 cycle!

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* daemon: enforce iptable rules are present with node-port is enabled

Moving forward the node-port code path will rely on iptables to
identify locally generated traffic and create NAT reservations for this
traffic in order to avoid reservation conflicts.

We will consider it a fatal error moving forward to disable iptable rule
installation while node-port or eBPF masquerading is enabled.

Signed-off-by: Louis DeLosSantos <[email protected]>

* bpf,nodeport: generalize SNAT conflict detection

Prior to this commit the nodeport SNAT conflict detection assumed host
traffic was sourced from the direct routing interface's IP and always
egressed on this device.

Remove this assumption by detecting locally generated traffic from any
egress interface 'bpf_host' is attached to.

This removes the dependency on the direct routing interface in the
node-port path.

Signed-off-by: Louis DeLosSantos <[email protected]>

* ztunnel: introduce end to end connectivity tests

The test infrastructure deploys three dedicated namespaces with client
and echo-server pods in each: two enrolled namespaces for testing
cross-namespace mTLS scenarios and one unenrolled namespace for baseline
verification. Pod affinity rules ensure echo-same-node pods co-locate
with clients while echo-other-node pods schedule on different nodes,
enabling both intra-node and inter-node traffic validation.

The test scenarios verify ztunnel mTLS behavior by dynamically labeling
namespaces with the io.cilium/mtls-enabled label during test execution.
For enrolled pod pairs, the tests assert that traffic flows through port
15008 (the ztunnel HBONE proxy) and that no unencrypted traffic appears
on port 8080. For unenrolled pods, the assertions are inverted to confirm
traffic bypasses ztunnel entirely. Cross-namespace scenarios confirm that
mTLS works correctly when client and server reside in separately enrolled
namespaces.

Packet capture validation runs from host network pods using tcpdump,
which is required since ztunnel operates in the host network namespace.
The tests query the ztunnel admin API to verify workload registration
before generating traffic, ensuring the any state the test depends on
has converged before making assertions.

Signed-off-by: Louis DeLosSantos <[email protected]>
Signed-off-by: Quang Nguyen <[email protected]>
Signed-off-by: Robin Gögge <[email protected]>

* ci,ztunnel: add workflows for ztunnel encryption tests

Add a new GitHub Actions workflow to run end-to-end tests for ztunnel
encryption in Cilium.

The new /ci-ztunnel-e2e trigger is added to the Ariane configuration,
pointing to the newly created conformance-ztunnel-e2e.yaml workflow
file.

Signed-off-by: Louis DeLosSantos <[email protected]>

* ci,ztunnel: add ztunnel cert script to actions

Signed-off-by: Louis DeLosSantos <[email protected]>

* datapath: remove GetRoutePostEncryptMTU()

The last user was removed by
f81d8964ce4a ("ipsec: don't reduce post-encrypt MTU for tunnel overhead").

Signed-off-by: Julian Wiedmann <[email protected]>

* datapath: ipsec: remove clean up code for encrypt IP rule

https://github.com/cilium/cilium/pull/41699 stopped installing this IP
rule in the v1.19 release. With v1.20 we can now stop cleaning it up.

Signed-off-by: Julian Wiedmann <[email protected]>

* loadbalancer/api: include proxy-redirect as backend

Currently, listing services via `cilium-dbg service list` displays
no backends for the services that use L7 proxy redirection (e.g.
Gateway API / Ingress loadbalancer service).

```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID   Frontend                    Service Type   Backend
1    10.96.62.28:80/TCP          ClusterIP      1 => 10.244.1.149:4245/TCP (active)
2    10.96.0.10:53/TCP           ClusterIP      1 => 10.244.1.69:53/TCP (active)
                                                2 => 10.244.1.109:53/TCP (active)
3    10.96.0.10:53/UDP           ClusterIP      1 => 10.244.1.69:53/UDP (active)
                                                2 => 10.244.1.109:53/UDP (active)
4    10.96.0.10:9153/TCP         ClusterIP      1 => 10.244.1.69:9153/TCP (active)
                                                2 => 10.244.1.109:9153/TCP (active)
5    10.96.0.1:443/TCP           ClusterIP      1 => 172.19.0.2:6443/TCP (active)
12   10.96.162.7:443/TCP         ClusterIP      1 => 172.19.0.2:4244/TCP (active)
15   10.96.106.135:80/TCP        ClusterIP      1 => 10.244.1.115:80/TCP (active)
16   10.96.55.103:80/TCP         ClusterIP      1 => 10.244.1.67:80/TCP (active)
17   [::]:30965/TCP              NodePort
19   [::]:30965/TCP/i            NodePort
21   0.0.0.0:30965/TCP           NodePort
23   0.0.0.0:30965/TCP/i         NodePort
25   10.96.245.249:80/TCP        ClusterIP
26   172.19.255.1:80/TCP         LoadBalancer
27   172.19.255.1:80/TCP/i       LoadBalancer
28   [fd00:10:96::d99f]:80/TCP   ClusterIP
```

Even though the actual redirection happens via BPF/iptables TPROXY, it would be
nice to display the information about the local redirection to the node-local
L7 proxy (Envoy) in some form.

Therefore, this commit changes the frontend model generation to treat the
proxy redirection as backend of the service frontend in the form
`<localhost-addr>:<proxy-port>/<proto> (active)`

Result

```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID   Frontend                    Service Type   Backend
1    10.96.62.28:80/TCP          ClusterIP      1 => 10.244.1.149:4245/TCP (active)
2    10.96.0.10:53/TCP           ClusterIP      1 => 10.244.1.69:53/TCP (active)
                                                2 => 10.244.1.109:53/TCP (active)
3    10.96.0.10:53/UDP           ClusterIP      1 => 10.244.1.69:53/UDP (active)
                                                2 => 10.244.1.109:53/UDP (active)
4    10.96.0.10:9153/TCP         ClusterIP      1 => 10.244.1.69:9153/TCP (active)
                                                2 => 10.244.1.109:9153/TCP (active)
5    10.96.0.1:443/TCP           ClusterIP      1 => 172.19.0.2:6443/TCP (active)
12   10.96.162.7:443/TCP         ClusterIP      1 => 172.19.0.2:4244/TCP (active)
15   10.96.106.135:80/TCP        ClusterIP      1 => 10.244.1.115:80/TCP (active)
16   10.96.55.103:80/TCP         ClusterIP      1 => 10.244.1.67:80/TCP (active)
17   [::]:30965/TCP              NodePort       1 => [::1]:14543/TCP (active)
19   [::]:30965/TCP/i            NodePort       1 => [::1]:14543/TCP (active)
21   0.0.0.0:30965/TCP           NodePort       1 => 127.0.0.1:14543/TCP (active)
23   0.0.0.0:30965/TCP/i         NodePort       1 => 127.0.0.1:14543/TCP (active)
25   10.96.245.249:80/TCP        ClusterIP      1 => 127.0.0.1:14543/TCP (active)
26   172.19.255.1:80/TCP         LoadBalancer   1 => 127.0.0.1:14543/TCP (active)
27   172.19.255.1:80/TCP/i       LoadBalancer   1 => 127.0.0.1:14543/TCP (active)
28   [fd00:10:96::d99f]:80/TCP   ClusterIP      1 => [::1]:14543/TCP (active)
```

Signed-off-by: Marco Hofstetter <[email protected]>

* datapath/ipcachelistener: use injected localnodestore

This commit refactors the datapath ipcache BPF listener to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.

Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).

Signed-off-by: Marco Hofstetter <[email protected]>

* datapath/linuxnodehandler: retrieve node ips from localnodestore

This commit refactors the `linuxNodeHandler` to fetch the node ips
from the injected `LocalNodeStore` instead of using the global
functions `node.GetIP[v4/v6]`.

Signed-off-by: Marco Hofstetter <[email protected]>

* identity/cache: use injected localnodestore

This commit refactors the identity cache to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.

Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).

Signed-off-by: Marco Hofstetter <[email protected]>

* node/address: remove global functions `GetIP[v4/v6]`

This commit removes the unused global functions `GetIPv4` & `GetIPv6`.

Signed-off-by: Marco Hofstetter <[email protected]>

* test: remove K8sDatapathBandwidthTest

The Ginkgo bandwidth test was already not running in CI. It only ran on
net-next kernels, but the focus group that included it
(f10-agent-hubble-bandwidth) was excluded from the net-next job.
Bandwidth enforcement is independently tested in
cilium-cli/connectivity/builder/network_bandwidth_limit.go.

Fixes: #44165
Related: #37837
Signed-off-by: Pavan More <[email protected]>

* wireguard: remove cleanup code for old userspace devices

2578e8ff4dfa ("wireguard: remove deprecated userspace fallback") removed
the userspace mode in v1.17, we can now stop cleaning up any network device
created for that mode.

Signed-off-by: Julian Wiedmann <[email protected]>

* loadbalancer: Check for equality and skip insert when not changed

This adds DeepEqual() implementations for Service and FrontendParams and uses
it to skip inserting the service and updating the frontend if nothing has changed.

As we now don't forcefully update the frontend the orphan detection has to actually
build a set to check if a frontend has been removed.

Benchmarks before:
Benchmark_UpsertServiceAndFrontends_100-6           3549            317691 ns/op            314771 objects/sec
BenchmarkInsertBackend-6                            2818            423975 ns/op            235863 objects/sec
BenchmarkReplaceBackend-6                         326682              3793 ns/op            263669 objects/sec
BenchmarkReplaceService-6                        2327074               509.4 ns/op         1963230 objects/sec

After:
Benchmark_UpsertServiceAndFrontends_100-6                   3464            331791 ns/op            301395 objects/sec
Benchmark_UpsertServiceAndFrontends_100_Unchanged-6        14652             81250 ns/op           1230766 objects/sec
BenchmarkInsertBackend-6                                    2956            401100 ns/op            249315 objects/sec
BenchmarkReplaceBackend-6                                3402430               360.9 ns/op         2771038 objects/sec
BenchmarkReplaceService-6                                2068555               556.6 ns/op         1796743 objects/sec

Signed-off-by: Jussi Maki <[email protected]>

* loadbalancer: Remove dummy ingress endpoint workaround

Remove the skipping of the 192.192.192.192:9999 dummy ingress
endpoint as the next commit will remove the creation of it.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Marco Hofstetter <[email protected]>

* operator/helm: Remove creation of dummy ingress endpoint

With the new agent load-balancer implementation the dummy endpoint
is no longer necessary. Now that v1.18 has been released with that
implementation we can remove the creation of the dummy endpoint
without breaking upgrade.

Note: This commit removes the dummy endpoint in the Operator-
(Gateway API & dedicated Ingress) & Helm- (shared Ingress) managed
`EndpointSlice`'s. The creation of the `EndpointSlice` itself will
will be done in a separate PR.

Fixes: #19262

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Marco Hofstetter <[email protected]>

* monitor: report 3rd argument in DBG_GENERIC debug monitor messages

Currently, in case cilium_dbg3(ctx, DBG_GENERIC, arg1, arg2, arg3) is used in
datapath code, the 3rd argument will not be reported in the monitor
message, e.g.:

    bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac)

Also report the 3rd argument, so the monitor message will look as
follows:

    bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac) arg3=170 (0xaa)

Signed-off-by: Tobias Klauser <[email protected]>

* policy: cleanup label selector parsing and validation

This commit cleans up the parsing and validation of policy label
selectors by removing the dependency on k8s labels library for
validation.

With this change we no longer convert cilium representation
of label keys with source prefixed using ':' delimiter to k8s specific
representation with source prefixed using '.' for validation.
This avoids the additional complexity in policy code to convert between
the two representations. It also simplifies marshalling of selectors as
the objects now have a unified format.

This is not a functional change and does not have any associated user
impact.

Signed-off-by: Deepesh Pathak <[email protected]>

* helm/ztunnel: bind health check to localhost

Security hardening for ztunnel running with hostNetwork: true:

Add host field to readiness probe to bind the health check port 15021
to 127.0.0.1 instead of 0.0.0.0. This reduces attack surface by ensuring
the health check endpoint is only accessible from localhost (kubelet
runs on same node).

Signed-off-by: Quang Nguyen <[email protected]>

* ci:wireguard: enable Host Firewall in native routing e2e tests

This enabled Host Firewall in the `wireguard-3` config.
This helps us validating that host-related packets go through the host
firewall hook in `cilium_host` when WireGuard is enabled in native routing.
In overlay, even if we'd miss a redirect we'd see the packet in bpf_overlay,
which will then redirect the packet to bpf_host for HostFW validation.

Signed-off-by: Simone Magnani <[email protected]>

* mcsapi: Add namespace filtering conditions to ServiceImport controller

Add namespace global status checking to the ServiceImport controller:
- Set Invalid condition on ServiceExport when namespace is not global
- Set NamespaceNotGlobal condition on ServiceImport when namespace is not global
- Signal service controller to skip/delete derived Service for non-global namespaces
  by setting SupportedIPFamilies annotation to empty

This implements the MCS-API portion of the ClusterMesh global namespace
filtering feature as described in CFP-39876.

Signed-off-by: Jacques Massa <[email protected]>

* docs: split up network policy language page

Up until now, the page title had been "Layer 3 Examples", which is
a section headline and confusing, since it is among other examples.
Splitting up into several pages, similar to `network/kubernetes/`,
keeps the ToC as it is, and makes it easier to navigate compared to
the lengthy page it was, while also giving each page a suitable
headline.
Since examples are mixed with the language specification, change
headings from "Layer 3 Examples" to "Layer 3 Policies", etc.
Drop the old page and redirect to the overview to keep links working.

Signed-off-by: Daniel Maslowski <[email protected]>

* golangci-lint: fix and simplify golangci-lint.sh

golangci-lint.sh would always rebuild the binary. The script was over-engineered
and didn't use the `golangci-lint version` subcommand, which doesn't need the
version string parsing logic. Simplify the script and fix the rebuild trigger.

Prepare the directory structure for splitting the main .golangci-lint.yaml in a
subsequent commit.

Signed-off-by: Timo Beckers <[email protected]>

* golangci-lint: split kubeapi configuration into separate file

The addition of kubeapilinter forced all users of the main .golangci.yaml to be
running a custom-built version of the tool and make sure it's kept in sync.

VSCode runs `golangci-lint run` in the repo by default and requires explicit
configuration to use another tool, like a script in the Cilium repository.
Before this PR, the script was broken and would always rebuilt custom-gcl.

Breaking the default tooling is not great. This commit moves kubeapi-specific
configuration to a dedicated config file in tools/golangci-lint-kubeapi and
specifies it explicitly when starting the kubeapi linter.

Signed-off-by: Timo Beckers <[email protected]>

* policy: Import the ClusterNetworkPolicy v1alpha2 API go modules

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add parser function for Cluster Network Policy

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add flags to enable or disable Cluster Network Policy

Disabled by default. A new Makefile target is added that enables it in kind clusters.

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add watcher for Cluster Network Policy

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Move NetworkPolicy and CNP/CCNP to the 'Normal' tier

Signed-off-by: Blaz Zupan <[email protected]>

* CODEOWNERS: re-assign operator/identitygc, adjust endpoint team description

Assign operator/identitygc to @cilium/sig-policy instead of
@cilium/endpoint based on Joe's feedback. This seems like the more
appropriate team based on the description.

Also update the description for the @cilium/endpoint team to cover
responsibilities for cluster-wide/k8s considerations and endpoint
lifecycle.

Follow-up to commit 11376d5c715b ("CODEOWNERS: add more specific owners
for operator subsystems").

Ref. https://github.com/cilium/cilium/pull/44279#pullrequestreview-3780398303

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>

* nodemap: converted net.IP to netip.Addr, Part of #24246

- Converted net.IP to netip.Addr in node map functions
- Added validation for zero netip.Addr values in newNodeKey
- Wrapped parse errors with %w for better error context

Signed-off-by: Sanjeevliv <[email protected]>

* bpf/tests: fix byte ordering for for TCP seq/win values

The pktgen logic for BPF tests sets default values on TCP packets in
host byte order only. This feeds into the TCP checksum calculation
and while these inputs are technically incorrect, tests that assert
the TCP checksum are correctly testing the invalid output values.

With the introduction of scapy packet definitions, matching these values
to existing tests is difficult because scapy does convert seq and win
values into network byte order. The effect is that L4 sequence numbers
between pktgen and scapy are wrong.

This commit updates pktgen and existing tests that assert these values
to use the appropriate byte conversion routines so that both pktgen
and scapy definitions produce the same results.

This causes all affected BPF tests to fail. This will be addressed
in the next commit.

Signed-off-by: Alasdair McWilliam <[email protected]>

* bpf/tests: fix TCP checksum assertions in all tests

This commit updates every test that asserts TCP checksums to reflect new
values following the change to the default TCP seq/win values, such that
the assertions for this should succeed with a scapy packet definition.

As part of this change, test failure logs have been updated to correctly
show value observed in a packet, and the expected value, both in network
byte order. This is to simplify future debugging. This has also been
applied to UDP checksum assertions for consistency.

Signed-off-by: Alasdair McWilliam <[email protected]>

* bpf/tests: fix default_data definition for scapy tests

The pktgen code defines default_data macro as a string literal which
includes a NUL-terminating character. The size of this string is 20
bytes.

The scapy code defines the same value without the NUL-terminating
character. The size of this string is 19 bytes.

As a result, when converting a test from pktgen to scapy, the packet
length changes, which causes assertion failures beause the change leads
to different L3 checksum values.

This commit modifies the scapy default_data variable to include a NUL
terminating character to match pktgen definitions. It is anticipated
this can be removed once all BPF tests are migrated to scapy.

Signed-off-by: Alasdair McWilliam <[email protected]>

* endpoint, fqdn: remove restoration of deprecated V1 DNSRules

Now that 1.16 is EOL, the code restoring PortProto V1 (without protocol)
can be removed.

Signed-off-by: Tobias Klauser <[email protected]>

* endpoint: rename DNS rules field

The V2 suffix is unnecessary now that we stop restoring V1 rules and
there is only a single supported version/field.

Also add a test that verifies that V2 rules are restored and V1 rules
are ignored while retaining backwards compatibility.

Signed-off-by: Tobias Klauser <[email protected]>

* tests: Ignore identity manager related error in versions < 1.18

Due to https://github.com/cilium/cilium/pull/42661 and
https://github.com/cilium/cilium/pull/42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
https://github.com/cilium/cilium/pull/42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>

* metrics: remove agent bootstrap metrics

This commit removes the deprecated agent bootstrap metrics.

Hive job metrics can be used to inspect the duration of the legacy
daemon initialization job. In special cases it might be worth to
introduce context specific metrics - similar to what has been
introduced for the endpoint restoration logic.

Signed-off-by: Marco Hofstetter <[email protected]>

* policy: fix policy tests

This fixes a policy break due to how label source is handled that
recently changed.

Fixes: 270daa4067 ("policy: Add parser function for Cluster Network Policy")
Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Odin Ugedal <[email protected]>

* resource/test: let TestResource_WithFakeClient set resource version

Update the TestResource_WithFakeClient test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* cid/test: let TestUpdatePodLabels set resource version

Update the TestUpdatePodLabels test to correctly specify the expected
resource version during updates, in preparation for extending the fake
client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* ipam/test: let Test_LocalNodeCIDRsSyncer set resource version

Update the Test_LocalNodeCIDRsSyncer test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* bgp/test: correctly set resource version when updating test resources

Update the bgp tests to correctly specify the expected resource version
during updates, in preparation for extending the fake client to actually
enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* test/controlplane: adaptation for optimistic concurrency control

Update the UpdateObjects helper to use the [ObjectTracker.Patch],
instead of [ObjectTracker.Update], in preparation for the subsequent
commit that will make the latter implement optimistic concurrency
control, and validate resource version mismatches, which is not
required in this context.

Signed-off-by: Marco Iorio <[email protected]>

* k8s/client/fake: fix resource version configuration in tracker

Currently, the object tracker is affected by a bug that causes the
resource version to not be set on creation or update if the object
does not have the [metav1.TypeMeta] set. Indeed, in that case, the
function updating the TypeMeta creates a deep copy of the object,
causing operations performed via [meta.Accessor] to act on the old
copy, and not have effect. Let's get this fixed by changing the
[fillTypeMetaIfNeeded] function to not create a deep copy, given
that it already operates on a copy of the original object.

Signed-off-by: Marco Iorio <[email protected]>

* k8s/client/fake: let update operations respect resource versioning

Currently, the statedb object tracker backing the fake kubernetes client
used for testing purposes does not respect resource versioning, and allows
update operations to succeed regardless of the provided resource version.
While convenient for the `k8s/update` command itself, this approach is
problematic in case of controllers acting on the same resources, as it
can lead to objects being unexpectedly reverted to incorrect versions,
due to the missing optimistic concurrency control.

Let's get this fixed by extending the update implementation to additionally
compare the resource version of the stored and provided objects, and reject
the update in case they do not match, as the real Kubernetes API Server
would do. By default, the k8s/update command still ignores the provided
resource version, letting the update succeed regardless: this matches the
desired behavior in the vast majority of the tests, and avoids the need
for complex operations to set the expected resource version. Still, if
necessary, the stricter behavior can be enabled via the dedicated flag.

Signed-off-by: Marco Iorio <[email protected]>

* chore(deps): update base-images to v1.26.0

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* chore(deps): update cilium/cilium-cli action to v0.19.1

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.6.8

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all-dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.gi…
fzu-huang pushed a commit to fzu-huang/cilium that referenced this pull request Feb 25, 2026
Due to cilium#42661 and
cilium#42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
cilium#42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>
jiashengz pushed a commit to Roblox/cilium that referenced this pull request Feb 25, 2026
* chore(deps): update module sigs.k8s.io/kube-api-linter to v0.0.0-20260206102632-39e3d06a2850

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update base-images

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* loadbalancer/maps: let SourceRangeKey.GetCIDR return netip.Prefix

Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.

This slightly improves pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service:  49 objs, 10042B alloc,  2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service:  59 objs, 10272B alloc,  3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service:  74 objs, 11128B alloc,  4662B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

Signed-off-by: Tobias Klauser <[email protected]>

* loadbalancer/maps: let ServiceKey.GetAddress return netip.Addr

This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.

pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects /  99675kB still reachable (per service:  38 objs,  9974B alloc,  2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service:  56 objs, 10212B alloc,  3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service:  74 objs, 11116B alloc,  4662B in-use)

Signed-off-by: Tobias Klauser <[email protected]>

* loadbalancer/maps: remove unused SkipLBMap delete methods

The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f8129d1a ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.

Signed-off-by: Tobias Klauser <[email protected]>

* fix(deps): update all go dependencies main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* docs: Update docsearch to v4.5.4

Pull in the latest theme with newer docsearch plugin version.

Signed-off-by: Joe Stringer <[email protected]>

* ci: update docs-builder

Signed-off-by: Cilium Imagebot <[email protected]>

* Use binary.NativeEndian instead of nl.NativeEndian

Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.

While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.

Follow-up to commit 2c1b49cac70b ("byteorder: use binary.NativeEndian")

Signed-off-by: Tobias Klauser <[email protected]>

* datapath: fix panic during datapath reinitialization

This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.

```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0

goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
        /go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```

With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.

Fixes 8fae439710fd4b426beae9190957d2380a96bed6 ("datapath: move DirectRoutingDevice validation to orchestrator")

Signed-off-by: Deepesh Pathak <[email protected]>

* datapath/loader: Add netkit to BPF load tests

Commit 854473726b ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.

This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.

Signed-off-by: Alasdair McWilliam <[email protected]>

* docs: add netkit requirement to kernel version list

Add Linux kernel requirement for netkit to the System Requirements.

Signed-off-by: Alasdair McWilliam <[email protected]>

* style(bpf/test): fix indentation

Signed-off-by: Andrea Terzolo <[email protected]>

* reafctor(bpf): move `icmp_wsum_accumulate` helper

This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.

Signed-off-by: Andrea Terzolo <[email protected]>

* refactor(bpf): move ICMPv6 packet generation to a separate file

The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file

Signed-off-by: Andrea Terzolo <[email protected]>

* refactor(bpf): reduce ifdef number

Signed-off-by: Andrea Terzolo <[email protected]>

* gateway-api: Update conformance test Make target

This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.

Signed-off-by: Nick Young <[email protected]>

* bpf: introduce DECLARE_CONFIG_KIND

DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.

Signed-off-by: Tobias Klauser <[email protected]>

* bpf: wire events map rate limits through node config

Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.

Signed-off-by: viktor-kurchenko <[email protected]>

* sysdump: Use label selectors for Hubble UI/Relay deployment collection

Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.

This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.

Fixes the issue where:
  cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.

Signed-off-by: darox <[email protected]>

* bpf: lxc: remove unnecessary L3 validation

There's no code that uses the IPv4 header afterwards.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf: lxc: fine-tune BPF Host Routing path

Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).

By structuring the code as a switch() statement we can also clean up one
of the goto paths.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf: xdp: prefer CTX_ACT_TX over XDP_TX

Return the generic value, so that readers understand what macro they should
be using when handling the result.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf, nat46x64: move RFC6052 prefix into node config

This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.

Updates included:

- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
  dropped.

Signed-off-by: viktor-kurchenko <[email protected]>

* neighbor: Fix description for L2 neighbor discovery

The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.

Co-authored-by: Dylan Reimerink <[email protected]>
Signed-off-by: Dylan Reimerink <[email protected]>
Signed-off-by: Paul Chaignon <[email protected]>

* CODEOWNERS: add more specific owners for operator subsystems

Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.

Signed-off-by: Tobias Klauser <[email protected]>

* .github/workflows: eks-cluster-pool-manager: fix race condition and cleanup leaked IAM roles

When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.

This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails

Signed-off-by: André Martins <[email protected]>

* hubble: Fix typos in config/set.go

Signed-off-by: harshitghagre <[email protected]>

* test/helpers: ignore error creating lease lock message

This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.

Signed-off-by: André Martins <[email protected]>

* Fix backend slot index mismatch in LB reconciler

Use slotID instead of loop index when setting backend slots to avoid
gaps when maintenance backends are skipped.

Signed-off-by: Aman-Cool <[email protected]>

* vendor: Bump to StateDB v0.6.3

This restores the ability to use Changes() against a WriteTxn that is
targeting the same table as Changes().

Signed-off-by: Jussi Maki <[email protected]>

* docs: Fix upgrade note category for tproxy

There's no new option here, it's a change in behaviour. Fix the category
for the upgrade note.

CC: Alasdair McWilliam <[email protected]>
Fixes: c257b3d0aca8 ("datapath/connector: do not support netkit and bpf.tproxy=true")
Signed-off-by: Joe Stringer <[email protected]>

* policy: Fix PASS verdict for non-consecutive tiers

Signed-off-by: Blaz Zupan <[email protected]>

* loadbalancer/healthserver: refresh ProxyRedirect per request

This commit fixes stale ProxyRedirect reads in the health server by reloading Service
state from the services table on each request. This prevents incorrect
local endpoint counts when Envoy redirect state changes after the
listener is created (which is the case).

Signed-off-by: Marco Hofstetter <[email protected]>

* localnodestore: provide WaitForNodeInformation

This commit moves global function `k8s.WaitForNodeInformation`
into the `localNodeSynchronizer`. To prevent dependencies to the
synchronizer, the `LocalNodeStore` re-exposes the same method and
delegates the call to the synchronizer.

This way, the legacy daemon initialization logic no longer needs to
dependent on the (k8s/cilium)-node resources.

This step isn't final. Eventually, the wait logic should watch
the state db "local node" for the changes.

Signed-off-by: Marco Hofstetter <[email protected]>

* localnodestore: pass localnodestore to synchronizer

With this commit, the `LocalNodeStore` passes a pointer
to itself to the synchronizer when calling `WaitForNodeInformation`.

This way, the synchronizer can update the ip allocation ranges without
using the global functions.

Signed-off-by: Marco Hofstetter <[email protected]>

* ci: e2e: add `kernel` to workflow job names

As a follow-up of changes introduced in https://github.com/cilium/cilium/pull/44126
to improve the readability of workflow job names in the GitHub UI, this
commit adds the `kernel` parameter to the name too. In OpenSearch,
it would be good to differentiate these jobs by the used kernel.
Given the kernel is a parameter that may vary frequently, it is good to be
able to differentiate runs that used the same configuration but different
kernels. That's also good in case of regressions when changing kernel.

The result will be `Setup & Test (ipsec-1, minor, 5.10)`.

Signed-off-by: Simone Magnani <[email protected]>

* pkg/datapath/bandwidth: optimize host endpoint QoS setup

The bandwidth manager was calling node.GetEndpointID() and performing
a table lookup on every UpdateBandwidthLimit() call, even though the
host endpoint QoS only needs to be set up once.

This commit introduces an atomic boolean flag that tracks whether the
host endpoint QoS has been configured. After the initial setup:
- Skip the GetEndpointID() call (fast path)
- Skip the Table.Get() call entirely
- Skip the redundant Insert() call for host endpoint

Additionally, based on review feedback:
- Change node.GetEndpointID() to return (uint64, bool) to indicate
  whether the host endpoint ID has been set, avoiding duplicate
  constants across packages
- Change node.endpointID to atomic.Uint64 for thread-safe access
  during initialization

Signed-off-by: Anand Kumar Shaw <[email protected]>

* clustermesh: fix a few misc issue with MCS-API doc

This commit fixes two issue with mcs-api doc:
- a simple typo on the enabled keywork
- change the code-block in parsed-literal as the |SCM_WEB| "variable"
  was not evaluated/replaced in the final doc with a code-block

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* Docs: improve docs around ipsec upgrade in 1.18

Signed-off-by: darox <[email protected]>

* docs(ztunnel): fix duplicate word (a set)

Signed-off-by: Alexis La Goutte <[email protected]>

* docs(ztunnel): add missing backslash

add missing backslash for install with Cilium CLI

Signed-off-by: Alexis La Goutte <[email protected]>

* clustermesh: helm: remove clustermesh.enableMCSAPISupport

This option was deprecated and its removal was announced for Cilium
1.20, so let's drop it now that we are in the 1.20 cycle!

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* daemon: enforce iptable rules are present with node-port is enabled

Moving forward the node-port code path will rely on iptables to
identify locally generated traffic and create NAT reservations for this
traffic in order to avoid reservation conflicts.

We will consider it a fatal error moving forward to disable iptable rule
installation while node-port or eBPF masquerading is enabled.

Signed-off-by: Louis DeLosSantos <[email protected]>

* bpf,nodeport: generalize SNAT conflict detection

Prior to this commit the nodeport SNAT conflict detection assumed host
traffic was sourced from the direct routing interface's IP and always
egressed on this device.

Remove this assumption by detecting locally generated traffic from any
egress interface 'bpf_host' is attached to.

This removes the dependency on the direct routing interface in the
node-port path.

Signed-off-by: Louis DeLosSantos <[email protected]>

* ztunnel: introduce end to end connectivity tests

The test infrastructure deploys three dedicated namespaces with client
and echo-server pods in each: two enrolled namespaces for testing
cross-namespace mTLS scenarios and one unenrolled namespace for baseline
verification. Pod affinity rules ensure echo-same-node pods co-locate
with clients while echo-other-node pods schedule on different nodes,
enabling both intra-node and inter-node traffic validation.

The test scenarios verify ztunnel mTLS behavior by dynamically labeling
namespaces with the io.cilium/mtls-enabled label during test execution.
For enrolled pod pairs, the tests assert that traffic flows through port
15008 (the ztunnel HBONE proxy) and that no unencrypted traffic appears
on port 8080. For unenrolled pods, the assertions are inverted to confirm
traffic bypasses ztunnel entirely. Cross-namespace scenarios confirm that
mTLS works correctly when client and server reside in separately enrolled
namespaces.

Packet capture validation runs from host network pods using tcpdump,
which is required since ztunnel operates in the host network namespace.
The tests query the ztunnel admin API to verify workload registration
before generating traffic, ensuring the any state the test depends on
has converged before making assertions.

Signed-off-by: Louis DeLosSantos <[email protected]>
Signed-off-by: Quang Nguyen <[email protected]>
Signed-off-by: Robin Gögge <[email protected]>

* ci,ztunnel: add workflows for ztunnel encryption tests

Add a new GitHub Actions workflow to run end-to-end tests for ztunnel
encryption in Cilium.

The new /ci-ztunnel-e2e trigger is added to the Ariane configuration,
pointing to the newly created conformance-ztunnel-e2e.yaml workflow
file.

Signed-off-by: Louis DeLosSantos <[email protected]>

* ci,ztunnel: add ztunnel cert script to actions

Signed-off-by: Louis DeLosSantos <[email protected]>

* datapath: remove GetRoutePostEncryptMTU()

The last user was removed by
f81d8964ce4a ("ipsec: don't reduce post-encrypt MTU for tunnel overhead").

Signed-off-by: Julian Wiedmann <[email protected]>

* datapath: ipsec: remove clean up code for encrypt IP rule

https://github.com/cilium/cilium/pull/41699 stopped installing this IP
rule in the v1.19 release. With v1.20 we can now stop cleaning it up.

Signed-off-by: Julian Wiedmann <[email protected]>

* loadbalancer/api: include proxy-redirect as backend

Currently, listing services via `cilium-dbg service list` displays
no backends for the services that use L7 proxy redirection (e.g.
Gateway API / Ingress loadbalancer service).

```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID   Frontend                    Service Type   Backend
1    10.96.62.28:80/TCP          ClusterIP      1 => 10.244.1.149:4245/TCP (active)
2    10.96.0.10:53/TCP           ClusterIP      1 => 10.244.1.69:53/TCP (active)
                                                2 => 10.244.1.109:53/TCP (active)
3    10.96.0.10:53/UDP           ClusterIP      1 => 10.244.1.69:53/UDP (active)
                                                2 => 10.244.1.109:53/UDP (active)
4    10.96.0.10:9153/TCP         ClusterIP      1 => 10.244.1.69:9153/TCP (active)
                                                2 => 10.244.1.109:9153/TCP (active)
5    10.96.0.1:443/TCP           ClusterIP      1 => 172.19.0.2:6443/TCP (active)
12   10.96.162.7:443/TCP         ClusterIP      1 => 172.19.0.2:4244/TCP (active)
15   10.96.106.135:80/TCP        ClusterIP      1 => 10.244.1.115:80/TCP (active)
16   10.96.55.103:80/TCP         ClusterIP      1 => 10.244.1.67:80/TCP (active)
17   [::]:30965/TCP              NodePort
19   [::]:30965/TCP/i            NodePort
21   0.0.0.0:30965/TCP           NodePort
23   0.0.0.0:30965/TCP/i         NodePort
25   10.96.245.249:80/TCP        ClusterIP
26   172.19.255.1:80/TCP         LoadBalancer
27   172.19.255.1:80/TCP/i       LoadBalancer
28   [fd00:10:96::d99f]:80/TCP   ClusterIP
```

Even though the actual redirection happens via BPF/iptables TPROXY, it would be
nice to display the information about the local redirection to the node-local
L7 proxy (Envoy) in some form.

Therefore, this commit changes the frontend model generation to treat the
proxy redirection as backend of the service frontend in the form
`<localhost-addr>:<proxy-port>/<proto> (active)`

Result

```
❯ kubectl -n kube-system exec -it cilium-rdcmj -- bash
root@kind-control-plane:/home/cilium# cilium-dbg service list
ID   Frontend                    Service Type   Backend
1    10.96.62.28:80/TCP          ClusterIP      1 => 10.244.1.149:4245/TCP (active)
2    10.96.0.10:53/TCP           ClusterIP      1 => 10.244.1.69:53/TCP (active)
                                                2 => 10.244.1.109:53/TCP (active)
3    10.96.0.10:53/UDP           ClusterIP      1 => 10.244.1.69:53/UDP (active)
                                                2 => 10.244.1.109:53/UDP (active)
4    10.96.0.10:9153/TCP         ClusterIP      1 => 10.244.1.69:9153/TCP (active)
                                                2 => 10.244.1.109:9153/TCP (active)
5    10.96.0.1:443/TCP           ClusterIP      1 => 172.19.0.2:6443/TCP (active)
12   10.96.162.7:443/TCP         ClusterIP      1 => 172.19.0.2:4244/TCP (active)
15   10.96.106.135:80/TCP        ClusterIP      1 => 10.244.1.115:80/TCP (active)
16   10.96.55.103:80/TCP         ClusterIP      1 => 10.244.1.67:80/TCP (active)
17   [::]:30965/TCP              NodePort       1 => [::1]:14543/TCP (active)
19   [::]:30965/TCP/i            NodePort       1 => [::1]:14543/TCP (active)
21   0.0.0.0:30965/TCP           NodePort       1 => 127.0.0.1:14543/TCP (active)
23   0.0.0.0:30965/TCP/i         NodePort       1 => 127.0.0.1:14543/TCP (active)
25   10.96.245.249:80/TCP        ClusterIP      1 => 127.0.0.1:14543/TCP (active)
26   172.19.255.1:80/TCP         LoadBalancer   1 => 127.0.0.1:14543/TCP (active)
27   172.19.255.1:80/TCP/i       LoadBalancer   1 => 127.0.0.1:14543/TCP (active)
28   [fd00:10:96::d99f]:80/TCP   ClusterIP      1 => [::1]:14543/TCP (active)
```

Signed-off-by: Marco Hofstetter <[email protected]>

* datapath/ipcachelistener: use injected localnodestore

This commit refactors the datapath ipcache BPF listener to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.

Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).

Signed-off-by: Marco Hofstetter <[email protected]>

* datapath/linuxnodehandler: retrieve node ips from localnodestore

This commit refactors the `linuxNodeHandler` to fetch the node ips
from the injected `LocalNodeStore` instead of using the global
functions `node.GetIP[v4/v6]`.

Signed-off-by: Marco Hofstetter <[email protected]>

* identity/cache: use injected localnodestore

This commit refactors the identity cache to use
the injected `LocalNodeStore` to retrieve the local node. This is a
preparation to eventually get rid of the global function
`node.GetIP[v4/v6]`.

Due to lack of a context and proper error handling, the refactoring
uses a `context.Background` and fatals in case of not being
able to get the local node (currently it panics).

Signed-off-by: Marco Hofstetter <[email protected]>

* node/address: remove global functions `GetIP[v4/v6]`

This commit removes the unused global functions `GetIPv4` & `GetIPv6`.

Signed-off-by: Marco Hofstetter <[email protected]>

* test: remove K8sDatapathBandwidthTest

The Ginkgo bandwidth test was already not running in CI. It only ran on
net-next kernels, but the focus group that included it
(f10-agent-hubble-bandwidth) was excluded from the net-next job.
Bandwidth enforcement is independently tested in
cilium-cli/connectivity/builder/network_bandwidth_limit.go.

Fixes: #44165
Related: #37837
Signed-off-by: Pavan More <[email protected]>

* wireguard: remove cleanup code for old userspace devices

2578e8ff4dfa ("wireguard: remove deprecated userspace fallback") removed
the userspace mode in v1.17, we can now stop cleaning up any network device
created for that mode.

Signed-off-by: Julian Wiedmann <[email protected]>

* loadbalancer: Check for equality and skip insert when not changed

This adds DeepEqual() implementations for Service and FrontendParams and uses
it to skip inserting the service and updating the frontend if nothing has changed.

As we now don't forcefully update the frontend the orphan detection has to actually
build a set to check if a frontend has been removed.

Benchmarks before:
Benchmark_UpsertServiceAndFrontends_100-6           3549            317691 ns/op            314771 objects/sec
BenchmarkInsertBackend-6                            2818            423975 ns/op            235863 objects/sec
BenchmarkReplaceBackend-6                         326682              3793 ns/op            263669 objects/sec
BenchmarkReplaceService-6                        2327074               509.4 ns/op         1963230 objects/sec

After:
Benchmark_UpsertServiceAndFrontends_100-6                   3464            331791 ns/op            301395 objects/sec
Benchmark_UpsertServiceAndFrontends_100_Unchanged-6        14652             81250 ns/op           1230766 objects/sec
BenchmarkInsertBackend-6                                    2956            401100 ns/op            249315 objects/sec
BenchmarkReplaceBackend-6                                3402430               360.9 ns/op         2771038 objects/sec
BenchmarkReplaceService-6                                2068555               556.6 ns/op         1796743 objects/sec

Signed-off-by: Jussi Maki <[email protected]>

* loadbalancer: Remove dummy ingress endpoint workaround

Remove the skipping of the 192.192.192.192:9999 dummy ingress
endpoint as the next commit will remove the creation of it.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Marco Hofstetter <[email protected]>

* operator/helm: Remove creation of dummy ingress endpoint

With the new agent load-balancer implementation the dummy endpoint
is no longer necessary. Now that v1.18 has been released with that
implementation we can remove the creation of the dummy endpoint
without breaking upgrade.

Note: This commit removes the dummy endpoint in the Operator-
(Gateway API & dedicated Ingress) & Helm- (shared Ingress) managed
`EndpointSlice`'s. The creation of the `EndpointSlice` itself will
will be done in a separate PR.

Fixes: #19262

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: Marco Hofstetter <[email protected]>

* monitor: report 3rd argument in DBG_GENERIC debug monitor messages

Currently, in case cilium_dbg3(ctx, DBG_GENERIC, arg1, arg2, arg3) is used in
datapath code, the 3rd argument will not be reported in the monitor
message, e.g.:

    bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac)

Also report the 3rd argument, so the monitor message will look as
follows:

    bpf_test.go:65: CPU 03: MARK 0x0 FROM 0 DEBUG: No message, arg1=238 (0xee) arg2=34214060 (0x20a10ac) arg3=170 (0xaa)

Signed-off-by: Tobias Klauser <[email protected]>

* policy: cleanup label selector parsing and validation

This commit cleans up the parsing and validation of policy label
selectors by removing the dependency on k8s labels library for
validation.

With this change we no longer convert cilium representation
of label keys with source prefixed using ':' delimiter to k8s specific
representation with source prefixed using '.' for validation.
This avoids the additional complexity in policy code to convert between
the two representations. It also simplifies marshalling of selectors as
the objects now have a unified format.

This is not a functional change and does not have any associated user
impact.

Signed-off-by: Deepesh Pathak <[email protected]>

* helm/ztunnel: bind health check to localhost

Security hardening for ztunnel running with hostNetwork: true:

Add host field to readiness probe to bind the health check port 15021
to 127.0.0.1 instead of 0.0.0.0. This reduces attack surface by ensuring
the health check endpoint is only accessible from localhost (kubelet
runs on same node).

Signed-off-by: Quang Nguyen <[email protected]>

* ci:wireguard: enable Host Firewall in native routing e2e tests

This enabled Host Firewall in the `wireguard-3` config.
This helps us validating that host-related packets go through the host
firewall hook in `cilium_host` when WireGuard is enabled in native routing.
In overlay, even if we'd miss a redirect we'd see the packet in bpf_overlay,
which will then redirect the packet to bpf_host for HostFW validation.

Signed-off-by: Simone Magnani <[email protected]>

* mcsapi: Add namespace filtering conditions to ServiceImport controller

Add namespace global status checking to the ServiceImport controller:
- Set Invalid condition on ServiceExport when namespace is not global
- Set NamespaceNotGlobal condition on ServiceImport when namespace is not global
- Signal service controller to skip/delete derived Service for non-global namespaces
  by setting SupportedIPFamilies annotation to empty

This implements the MCS-API portion of the ClusterMesh global namespace
filtering feature as described in CFP-39876.

Signed-off-by: Jacques Massa <[email protected]>

* docs: split up network policy language page

Up until now, the page title had been "Layer 3 Examples", which is
a section headline and confusing, since it is among other examples.
Splitting up into several pages, similar to `network/kubernetes/`,
keeps the ToC as it is, and makes it easier to navigate compared to
the lengthy page it was, while also giving each page a suitable
headline.
Since examples are mixed with the language specification, change
headings from "Layer 3 Examples" to "Layer 3 Policies", etc.
Drop the old page and redirect to the overview to keep links working.

Signed-off-by: Daniel Maslowski <[email protected]>

* golangci-lint: fix and simplify golangci-lint.sh

golangci-lint.sh would always rebuild the binary. The script was over-engineered
and didn't use the `golangci-lint version` subcommand, which doesn't need the
version string parsing logic. Simplify the script and fix the rebuild trigger.

Prepare the directory structure for splitting the main .golangci-lint.yaml in a
subsequent commit.

Signed-off-by: Timo Beckers <[email protected]>

* golangci-lint: split kubeapi configuration into separate file

The addition of kubeapilinter forced all users of the main .golangci.yaml to be
running a custom-built version of the tool and make sure it's kept in sync.

VSCode runs `golangci-lint run` in the repo by default and requires explicit
configuration to use another tool, like a script in the Cilium repository.
Before this PR, the script was broken and would always rebuilt custom-gcl.

Breaking the default tooling is not great. This commit moves kubeapi-specific
configuration to a dedicated config file in tools/golangci-lint-kubeapi and
specifies it explicitly when starting the kubeapi linter.

Signed-off-by: Timo Beckers <[email protected]>

* policy: Import the ClusterNetworkPolicy v1alpha2 API go modules

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add parser function for Cluster Network Policy

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add flags to enable or disable Cluster Network Policy

Disabled by default. A new Makefile target is added that enables it in kind clusters.

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Add watcher for Cluster Network Policy

Signed-off-by: Blaz Zupan <[email protected]>

* policy: Move NetworkPolicy and CNP/CCNP to the 'Normal' tier

Signed-off-by: Blaz Zupan <[email protected]>

* CODEOWNERS: re-assign operator/identitygc, adjust endpoint team description

Assign operator/identitygc to @cilium/sig-policy instead of
@cilium/endpoint based on Joe's feedback. This seems like the more
appropriate team based on the description.

Also update the description for the @cilium/endpoint team to cover
responsibilities for cluster-wide/k8s considerations and endpoint
lifecycle.

Follow-up to commit 11376d5c715b ("CODEOWNERS: add more specific owners
for operator subsystems").

Ref. https://github.com/cilium/cilium/pull/44279#pullrequestreview-3780398303

Suggested-by: Joe Stringer <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>

* nodemap: converted net.IP to netip.Addr, Part of #24246

- Converted net.IP to netip.Addr in node map functions
- Added validation for zero netip.Addr values in newNodeKey
- Wrapped parse errors with %w for better error context

Signed-off-by: Sanjeevliv <[email protected]>

* bpf/tests: fix byte ordering for for TCP seq/win values

The pktgen logic for BPF tests sets default values on TCP packets in
host byte order only. This feeds into the TCP checksum calculation
and while these inputs are technically incorrect, tests that assert
the TCP checksum are correctly testing the invalid output values.

With the introduction of scapy packet definitions, matching these values
to existing tests is difficult because scapy does convert seq and win
values into network byte order. The effect is that L4 sequence numbers
between pktgen and scapy are wrong.

This commit updates pktgen and existing tests that assert these values
to use the appropriate byte conversion routines so that both pktgen
and scapy definitions produce the same results.

This causes all affected BPF tests to fail. This will be addressed
in the next commit.

Signed-off-by: Alasdair McWilliam <[email protected]>

* bpf/tests: fix TCP checksum assertions in all tests

This commit updates every test that asserts TCP checksums to reflect new
values following the change to the default TCP seq/win values, such that
the assertions for this should succeed with a scapy packet definition.

As part of this change, test failure logs have been updated to correctly
show value observed in a packet, and the expected value, both in network
byte order. This is to simplify future debugging. This has also been
applied to UDP checksum assertions for consistency.

Signed-off-by: Alasdair McWilliam <[email protected]>

* bpf/tests: fix default_data definition for scapy tests

The pktgen code defines default_data macro as a string literal which
includes a NUL-terminating character. The size of this string is 20
bytes.

The scapy code defines the same value without the NUL-terminating
character. The size of this string is 19 bytes.

As a result, when converting a test from pktgen to scapy, the packet
length changes, which causes assertion failures beause the change leads
to different L3 checksum values.

This commit modifies the scapy default_data variable to include a NUL
terminating character to match pktgen definitions. It is anticipated
this can be removed once all BPF tests are migrated to scapy.

Signed-off-by: Alasdair McWilliam <[email protected]>

* endpoint, fqdn: remove restoration of deprecated V1 DNSRules

Now that 1.16 is EOL, the code restoring PortProto V1 (without protocol)
can be removed.

Signed-off-by: Tobias Klauser <[email protected]>

* endpoint: rename DNS rules field

The V2 suffix is unnecessary now that we stop restoring V1 rules and
there is only a single supported version/field.

Also add a test that verifies that V2 rules are restored and V1 rules
are ignored while retaining backwards compatibility.

Signed-off-by: Tobias Klauser <[email protected]>

* tests: Ignore identity manager related error in versions < 1.18

Due to https://github.com/cilium/cilium/pull/42661 and
https://github.com/cilium/cilium/pull/42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
https://github.com/cilium/cilium/pull/42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>

* metrics: remove agent bootstrap metrics

This commit removes the deprecated agent bootstrap metrics.

Hive job metrics can be used to inspect the duration of the legacy
daemon initialization job. In special cases it might be worth to
introduce context specific metrics - similar to what has been
introduced for the endpoint restoration logic.

Signed-off-by: Marco Hofstetter <[email protected]>

* policy: fix policy tests

This fixes a policy break due to how label source is handled that
recently changed.

Fixes: 270daa4067 ("policy: Add parser function for Cluster Network Policy")
Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Odin Ugedal <[email protected]>

* resource/test: let TestResource_WithFakeClient set resource version

Update the TestResource_WithFakeClient test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* cid/test: let TestUpdatePodLabels set resource version

Update the TestUpdatePodLabels test to correctly specify the expected
resource version during updates, in preparation for extending the fake
client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* ipam/test: let Test_LocalNodeCIDRsSyncer set resource version

Update the Test_LocalNodeCIDRsSyncer test to correctly specify the
expected resource version during updates, in preparation for extending
the fake client to actually enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* bgp/test: correctly set resource version when updating test resources

Update the bgp tests to correctly specify the expected resource version
during updates, in preparation for extending the fake client to actually
enforce optimistic concurrency control.

Signed-off-by: Marco Iorio <[email protected]>

* test/controlplane: adaptation for optimistic concurrency control

Update the UpdateObjects helper to use the [ObjectTracker.Patch],
instead of [ObjectTracker.Update], in preparation for the subsequent
commit that will make the latter implement optimistic concurrency
control, and validate resource version mismatches, which is not
required in this context.

Signed-off-by: Marco Iorio <[email protected]>

* k8s/client/fake: fix resource version configuration in tracker

Currently, the object tracker is affected by a bug that causes the
resource version to not be set on creation or update if the object
does not have the [metav1.TypeMeta] set. Indeed, in that case, the
function updating the TypeMeta creates a deep copy of the object,
causing operations performed via [meta.Accessor] to act on the old
copy, and not have effect. Let's get this fixed by changing the
[fillTypeMetaIfNeeded] function to not create a deep copy, given
that it already operates on a copy of the original object.

Signed-off-by: Marco Iorio <[email protected]>

* k8s/client/fake: let update operations respect resource versioning

Currently, the statedb object tracker backing the fake kubernetes client
used for testing purposes does not respect resource versioning, and allows
update operations to succeed regardless of the provided resource version.
While convenient for the `k8s/update` command itself, this approach is
problematic in case of controllers acting on the same resources, as it
can lead to objects being unexpectedly reverted to incorrect versions,
due to the missing optimistic concurrency control.

Let's get this fixed by extending the update implementation to additionally
compare the resource version of the stored and provided objects, and reject
the update in case they do not match, as the real Kubernetes API Server
would do. By default, the k8s/update command still ignores the provided
resource version, letting the update succeed regardless: this matches the
desired behavior in the vast majority of the tests, and avoids the need
for complex operations to set the expected resource version. Still, if
necessary, the stricter behavior can be enabled via the dedicated flag.

Signed-off-by: Marco Iorio <[email protected]>

* chore(deps): update base-images to v1.26.0

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* chore(deps): update cilium/cilium-cli action to v0.19.1

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.6.8

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all lvh-images main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all github action dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update all-dependencies

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* images: update cilium-{runtime,builder}

Signed-off-by: Cilium Imagebot <[email protected]>

* test: fix goleak check in combination with script tests

Currently, multiple script tests are intended to validate that no
goroutines are leaked once the tests end, deferring the invocation
of the dedicated [testutils.GoleakVerifyNone] function. However,
the underlying [goleak.VerifyNone] utility is incompatible with
t.Parallel [1], which is set by default by script tests, and no
check is actually performed.

Let's get this fixed by using [goleak.VerifyTestMain] instead, as
also suggested by goleak documentation itself. This commit fixes all
occurrences spotted via:

$ git grep -l GoleakVerifyNone | xargs grep -l testdata

It is worth additionally mentioning that:

* GoleakVerifyTestMain was already invoked in the redirectpolicy
  package, and is thus not added;
* The functions previously ignored in the devices_controller tests
  do not appear to be necessary anymore, and have been omitted; yet,
  we need to additionally ignore one metrics related goroutine that
  is otherwise flagged when IPSec is enabled;
* One of the script tests in the route/reconciler package did not
  correctly stop the hive, causing a few goroutines to be leaked.

Ideally we should have a linter to catch this problem directly
in CI, but that's deferred for the future.

[1]: https://pkg.go.dev/go.uber.org/goleak#VerifyNone

Signed-off-by: Marco Iorio <[email protected]>

* fix(deps): update all go dependencies main

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* chore(deps): update quay.io/cilium/cilium-envoy docker tag to v1.36.5-1770892622-f97ae52c05a1edbbdaa6393f8595431259cf2ca1

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>

* README: Update releases

Signed-off-by: Tim Horner <[email protected]>

* docs: fix duplicate --version in Helm OCI install/upgrade examples

|CHART_VERSION| already expands to '--version <release>'.
Removing the extra literal --version before |CHART_VERSION| so the
rendered CLI is correct (e.g. single '--version 1.19.0').

Signed-off-by: Ghassan Malke <[email protected]>

* gh: e2e-upgrade: skip disk cleanup when workflow is skipped

Most parts of this workflow are skipped when testing patch-level upgrades
in the `main` branch. Also skip the initial disk cleanup, which takes
around 1 minute.

Signed-off-by: Julian Wiedmann <[email protected]>

* bpf:refactor: move ipv{4,6}_host_delivery to local_delivery.h

The two bpf programs `bpf_overlay` and `bpf_wireguard` share the same
logic to redirect a packet to cilium_host@ingress. The mere difference is
that in WireGuard the packet needs to be adjusted to add the Ethernet
layer before doing that, and using __ETH_HLEN rather than ETH_HLEN for
the L3 header offset computation.

Let's move the common logic into `local_delivery.h`, and add comments
to the functions to clarify their purpose.

Signed-off-by: Simone Magnani <[email protected]>

* bpf:refactor:wireguard: remove resolve_srcid_ipv{4,6}

We copied-it over from bpf_host, with apposite simplifications for WireGuard.
Having this helper with the same name in both programs is a bit
confusing IMHO. Moving this into `identity.h` would be good, but I checked
most of the codebase and we actually do an inline lookup to retrieve the
`info->sec_identity`. Adapting all LOCs would not be worth it, as in most
cases the `info` pointer is needed for other matters and not only for
retrieving the identity. I have not found a common denominator yet.

For this reason, I opted for simplifying this code in `bpf_wireguard` only.
While doing that, let's also start re-using UNKNOWN_ID as default rather
than WORLD_IPV{4,6}_ID, as we were doing in `bpf_host`` before porting over the code.

Signed-off-by: Simone Magnani <[email protected]>

* bpf, datapath: move CIDR identity range macros to bpf/lib/identity.h

This change moves the CIDR identity range macros in bpf/lib/identity.h
and stops emitting CIDR_IDENTITY_RANGE_* defines from the datapath
header writer.

Signed-off-by: viktor-kurchenko <[email protected]>

* renovate: allow update of k8s libraries in stable branches

The intent was always to update the k8s libraries patch versions into
the stable branches but due to a misconfiguration in renovate config
that never happened.

Signed-off-by: André Martins <[email protected]>

* renovate: skip updating sigs.k8s.io/network-policy-api v0.1.8

This tag does not contain a package that exist in a later commit, thus
we should skip it until it gets fixed.

Signed-off-by: André Martins <[email protected]>

* gh: e2e-upgrade: don't hardcode IPsec encryption algorithm

Some e2e configs specify a different encryption algorithm (cbc-aes-sha256).
Have the e2e-upgrade workflow respect this.

Signed-off-by: Julian Wiedmann <[email protected]>

* policy: Pull labels from the CachedSelectionUser

Pull labels from the CachedSelectionUser instead of caching them in the
identitySelector. Caching the rule labels in the identitySelector only
works when the selector is only ever used in a single policy. Pulling the
labels from the registered users can pull the labels from all rules that
are currently using the selector.

Signed-off-by: Jarno Rajahalme <[email protected]>

* api: Return a label array list for policy selectors

Return a list of labels arrays in the labels field of 'cilium-dbg policy
selectors' response, so that the labels from all the "user" rules can be
returned. With this change the labels field shows the labels from all
"users" rather than just one of them.

Example output of 'cilium-dbg policy selectors' before and after:

Before:

SELECTOR                                                                                                         LABELS                  USERS   IDENTITIES
&LabelSelector{MatchLabels:map[string]string{reserved.host: ,},MatchExpressions:[]LabelSelectorRequirement{},}                           1       1
&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},}                   default/allow-80-8080   2       1
                                                                                                                                                 2
...

After:

SELECTOR                                                                                                         LABELS                                  USERS   IDENTITIES
&LabelSelector{MatchLabels:map[string]string{reserved.host: ,},MatchExpressions:[]LabelSelectorRequirement{},}                                           1       1
&LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{},}                   default/allow-80-8080,default/l7-rule   2       1
                                                                                                                                                                 2
...

Signed-off-by: Jarno Rajahalme <[email protected]>

* Revert "bpf: wire events map rate limits through node config"

This reverts commit b4fed1ddd0b25333eb3831537b9c450a86aa5d02.

Signed-off-by: viktor-kurchenko <[email protected]>

* bgp: Extend Router.GetPeers to return address

We need this to query adj-rib in the new GetRoutes API.

Signed-off-by: Yutaro Hayakawa <[email protected]>

* bgp: Mark existing BGPRouterManager.GetRoutes as legacy

The current RouterManager.GetPeer API returns the API model directly. In
the new CLI, we will rely on the new internal model. Since, we still
need to keep the legacy output around for a while, mark it as legacy and
keep it as is. We'll introduce a new implementation in the subsequent
commits.

Signed-off-by: Yutaro Hayakawa <[email protected]>

* bgp: Minor fixes for bgp/peers command

- Remove unnecessary extra P from the function name
- Fix the bug in the Instance and Peer deduplication logic
- Don't sort slice including header

Signed-off-by: Yutaro Hayakawa <[email protected]>

* bgp: Introduce a new BGPRouterManager.GetRoutes

Introduce a new GetRoutes API on BGPRouterManager which returns
BGPv2-native result. The result contains the Instance name that the
route is retrieved from, and the Neighbor name for adj-rib-in and
adj-rib-out.

Signed-off-by: Yutaro Hayakawa <[email protected]>

* bgp: add routes hive command

Introduce bgp/routes to query BGPRouterManager.GetRoutes and render a
structured table for loc-rib and adj-rib tables.

Example output:

loc-rib

```
Instance      Prefix          NextHop    Best   Age
instance0     10.0.0.0/32     0.0.0.0    true   10s
              10.0.0.1/32     0.0.0.0    true   10s
instance1     10.0.0.2/32     0.0.0.0    true   10s
```

adj-rib

```
Instance    Peer    Prefix            NextHop       Age
instance0   peer0   10.244.0.0/24     10.99.0.110   10s
                    10.96.50.104/32   10.99.0.110   10s
            peer1   10.244.0.0/24     10.99.0.110   10s
                    10.96.50.104/32   10.99.0.110   10s
instance1   peer2   10.244.0.0/24     10.99.0.110   10s
                    10.96.50.104/32   10.99.0.110   10s
```

It has an optional flags -o (output to file) and --no-age (disable Age
output to make the command output predictable).

Signed-off-by: Yutaro Hayakawa <[email protected]>

* bgp: Add a simple test command to advertise route

The adj-rib-in test requires the peer GoBGP to advertise the route. Add
a simple command to advertise route from test GoBGP instance.

Signed-off-by: Yutaro Hayakawa <[email protected]>

* bgp: Add command output script test scenrio

Add a script test scenario that tests the output of the bgp/peers and
bgp/routes commands.

Signed-off-by: Yutaro Hayakawa <[email protected]>

* options: Migrate `VtepCidrMask` from `net.IP` to `netip.Addr`

Related: #24246

Signed-off-by: Hadrien Patte <[email protected]>

* policy: add support for wildcard specifier anywhere in sni pattern

This commit relaxes k8s api validation pattern for server names in
policy api to allow wildcard specifiers anywhere in SNI pattern.
This allows users to write more compressed network policies and is
inline with the syntax supported in FQDN match pattern.

With this change users can now specify allowed server names with
wildcard as:

- '**.cilium.io': Existing behavior which matches any number of
  subdomain levels in the prefix. "test.cilium.io" and
  "test.app.cilium.io" matches but "cilium.io" does not.

- '*.cilium.io': Existing behavior which matches all subdomains of
  cilium.io on a single level. "test.cilium.io" matches but
  "test.app.cilium.io" and "cilium.io" do not.

- 'sub*.cilium.io': Matches subdomains of cilium.io where the subdomain
  component begins with "sub"(only one level). "sub.cilium.io" and
  "subdomain.cilium.io" matches wile "www.cilium.io", "cilium.io" and
  "test.subdomain.cilium.io" do not.

  Additionally this commit introduces a new helper function used to
  sanitize server names pattern when converting to envoy protobuf. This
  is required because cilium-envoy doesn't support the same semantics
  for match pattern syntax as DNS match pattern in cilium-agent.

Signed-off-by: Deepesh Pathak <[email protected]>

* cli: add connectivity test for tls sni pattern with random wildcard

Signed-off-by: Deepesh Pathak <[email protected]>

* docs: Fix formatting for install command for GKE Clustermesh

The snippet contains a |CHART_VERSION| directive that is not substituted
when generating the docs, because it's under a "code-block" directive
instead of a "parsed-literal".

Fix the directive, adjust backslashes accordingly, and remove the
redundant "--version" argument (already generated when expanding
|CHART_VERSION|).

Trim trailing white spaces in the file.

Fixes: 63bfe7d8f943 ("Added GKE-to-GKE Clustermesh Preparation guide")
Signed-off-by: Quentin Monnet <[email protected]>

* docs: Fix formatting for command to use Prometheus metrics:

The snippet contains a |CHART_VERSION| directive that is not substituted
when generating the docs, because it's under a "code-block" directive
instead of a "parsed-literal".

Fix the directive, adjust backslashes accordingly, and remove the
redundant "--version" argument (already generated when expanding
|CHART_VERSION|).

Trim trailing white spaces in the file.

Fixes: b76f9285bb94 ("docs: add Helm configuration instructions for metrics")
Signed-off-by: Quentin Monnet <[email protected]>

* docs: Fix commands split in parsed-literal blocks throughout docs

For "parsed-literal" blocks, we need a double-backslash at the end of
the line to make a single "\" appear in the generated HTML docs. With a
single one in the source, Sphinx remove the line breaks and leaves
multiple spaces (from the indentation on the next line) instead.

Go through multiple locations (all that I could find) in the docs where
we have parsed-literal blocks with single backslashes to mark a command
split, and adjust with double backslashes instead.

For .../sbom.rst, also add the missing indentation marking the
continuation of the command line.

Trim trailing white spaces, if any, in all edited files.

Signed-off-by: Quentin Monnet <[email protected]>

* bpf/tests/scapy: fix CTX len on pkt len mismatch

Fix error trace showing the incorrect CTX len when packet and buffer
length mismatched.

Signed-off-by: Marc Suñé <[email protected]>

* bpf/tests/scapy: throw build bug if pkts > 1518b

Throw build bug if packets exceed the __SCAPY_MAX_BUF (1518bytes) on
BUF_DECL().

Signed-off-by: Marc Suñé <[email protected]>

* bpf/tests/scapy: cleanup __ASSERT_TRACE_FAIL_BUF()

Remove unnecessary args to __ASSERT_TRACE_FAIL_BUF().

Signed-off-by: Marc Suñé <[email protected]>

* bpf/tests/scapy: support pkts 1036-1518 bytes

Commit e80be9ebff work-arounded the 128 byte limitation of the
cilium builtins implementation by reimplementing (hack, innefficiently
as noted in the commit msg) a simple version of memcpy/memcmp to be
used by scapy assert checks (only).

Unfortunately, when buffers exceed ~1036 bytes, the clang/LLVM optimizer
removes memcpy code and (attempts to) use built-in instead (even at O1).
Since the builtin has a hard limit of 128 8-byte words [1] (thanks
Daniel Borkmann for the pointer), this leads to:

In file included from _scapy_selftest.c:12:
./scapy.h:64:23: error: A call to built-in function 'memcpy' is not supported.
   64 |                 *(__u64 *)(dst + i) = *(__u64 *)(src + i);
      |                                     ^
./scapy.h:64:23: error: A call to built-in function 'memcpy' is not supported.
./scapy.h:64:23: error: A call to built-in function 'memcpy' is not supported.
./scapy.h:64:23: error: A call to built-in function 'memcpy' is not supported.

or if directly using `__bpf_memcpy_builtin()` (which calls
`__builtin_memcpy()`):

In file included from _scapy_selftest.c:7:
In file included from ./pktgen.h:7:
/home/msuneclo/dev/cilium/bpf/include/bpf/builtins.h:165:2: error: A call to built-in function 'memcpy' is not supported.
  165 |         __builtin_memcpy(d, s, len);

This commit works-around this issue too by wrapping the original
_scapy_memcpy() to perform chunked memcpys. Current packet size is limited to
1518 bytes (__SCAPY_MAX_BUF), but could be extended to 2K.

[1] https://github.com/llvm/llvm-project/blob/a6929f7937696bb07788be6428fdcf1bf36775b5/llvm/lib/Target/BPF/BPFSelectionDAGInfo.h#L34

Signed-off-by: Marc Suñé <[email protected]>
Reported-by: Simone Magnani <[email protected]>

* script:test:lb: explicit deny ClusterIP access when not enabled

This commit has no functional changes, but it explicitly set the
`--bpf-lb-external-clusterip=false` flag in the test data, to make it
clear and explicit that we don't expect ClusterIP to be routable.

Signed-off-by: Simone Magnani <[email protected]>

* bpf:test: add IPv4/6 coverage for nodeport non-routable cluster IP

This commits adds `tc_lb{4,6}_nonroutable_clusterip` test that ensures
packets sent from external node to a non-routable ClusterIP service
are dropped with the correct reason code DROP_IS_CLUSTER_IP and
that the metrics are updated correctly.

Signed-off-by: Simone Magnani <[email protected]>

* ginkgo: remove `ClusterIP cannot be accessed externally when access is disabled`

The Ginkgo test verifies that ClusterIP services are not reachable from
external (i.e., nodeWithoutCilium) when the `bpf-lb-external-clusterip` flag is
disabled. However, this behavior is already covered by:

1. `pkg/loadbalancer/tests/testdata/clusterip.txtar`, where in the LB map
   we expect `FLAGS=ClusterIP+sessionAffinity+non-routable` for the service.
2. `tc_lb{4,6}_nonroutable_clusterip.c`, where we verify that an
   incoming packet destined to a service w/o the SVC_FLAG_ROUTABLE is
   being dropped.

Thus, this Ginkgo test can be simply removed as is.

Signed-off-by: Simone Magnani <[email protected]>

* loader: Reduce number of permutations for load-time configs

The runtime is growing exponentially with the number of load-time
permutations we are covering in verifier tests. This is already causing
timeouts in some cases, so let's try to reconsider permutations.

This commit removes coverage for some load-time configs being disabled.
All of these configs are enabled by default and unlikely to be disabled
by users. Even if they were disabled, it's unlikely that action would
increase complexity (on the contrary).

Signed-off-by: Paul Chaignon <[email protected]>

* node/address: refactor GetCiliumEndpointNodeIP

Currently, the global function `node.GetCiliumEndpointNodeIP` uses
the global `LocalNodeStore` instance to retrieve the local node.

In preparation to eventually get rid of the global field that holds the
local node store, this commit refactors the function `GetCiliumEndpointNodeIP`
to expect the local node store to be passed as an argument.

This also allows us to get rid of some test related helper functions and
makes dependencies explicit.

Note: Not all refactored places properly support context propagation
and error handling. For this places, we currently use `context.Background()`
and `logging.Fatal`. This is similar to what already happened under the hood.

Note 2: In a later step we might want to refactor the function into a
method of the `LocalNode`. I hesitate to do it in this commit because
IMO it's more endpoint than node related :/

Signed-off-by: Marco Hofstetter <[email protected]>

* README: Update releases

Signed-off-by: Tim Horner <[email protected]>

* clustermesh: Enable global namespace default configuration option

Expose the `--clustermesh-default-global-namespace` flag that was
previously hidden, allowing users to configure whether namespaces
are treated as global by default in Clustermesh.

Changes:
- Add Helm value `clustermesh.defaultGlobalNamespace`
- Pass flag to clustermesh-apiserver deployment and cilium-config
- Update documentation and schema
- Remove MarkHidden() for the flag in config.go

Signed-off-by: Anubhab Majumdar <[email protected]>

* ci: Add global namespace support for connectivity tests

Add support for testing clustermesh with defaultGlobalNamespace=false:
- Add defaultGlobalNamespace to test non-global namespace behavior in
  conformance-clustermesh.yaml
- Annotate namespaces in deployment.yaml when
  defaultGlobalNamespace=false

Signed-off-by: Anubhab Majumdar <[email protected]>

* ingress/gateway-api: remove EndpointSlice creation

With the removal of the ingress dummy endpoint from the `EndpointSlice`s
that are used for Cilium Ingress & Gateway API, it's possible to get
rid of the need to create and manage these `EndpointSlice`s completely.

This commit removes the respective Helm files, Operator reconciliation logic/watches.

* Gateway API: remove per-Gateway EndpointSlice creation
* Ingress (dedicated): remove per Ingress EndpointSlice creation
* Ingress (shared): remove shared EndpointSlice from Helm

Helm keeps track of installed resources in a k8s secret and will
remove these when updating from a previous version.

The removal of any potential leftovers of Operator-managed `EndpointSlices`
from previous versions on upgrade is handled in the following commit.

Signed-off-by: Marco Hofstetter <[email protected]>

* ingress/gateway-api: cleanup old EndpointSlice

This commit ensures that any `EndpointSlice`s
that have been …
javiercardona-work pushed a commit to javiercardona-work/cilium that referenced this pull request Mar 18, 2026
Due to cilium#42661 and
cilium#42662 not being backported yet to
v1.17, CI fails in the upgrade/downgrade test with this error.
Therefore, we must add it to the ignore list until the PRs are at least
backported to v1.17.

The error was removed from the ignore list in
cilium#42982.

Suggested-by: Marco Iorio <[email protected]>
Suggested-by: Casey Callendrello <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake cilium-cli This PR contains changes related with cilium-cli ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

Error removing identity not added to the identity manager! on agent init

6 participants