Skip to content

chore: sync fork with upstream cilium/cilium#20

Merged
jiashengz merged 2110 commits intomainfrom
sync/upstream
Feb 25, 2026
Merged

chore: sync fork with upstream cilium/cilium#20
jiashengz merged 2110 commits intomainfrom
sync/upstream

Conversation

@github-actions
Copy link
Copy Markdown

Automated weekly sync from upstream cilium/cilium.

Upstream is 2110 commit(s) ahead.

cilium-renovate bot and others added 30 commits February 9, 2026 10:59
…0206102632-39e3d06a2850

Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
Its only caller (apart from use in log strings) directly converts the
result to a netip.Prefix. Rather than first constructing a *cidr.CIDR
only to then convert it to netip.Prefix, construct a netip.Prefix right
away. Also rename the method accordingly.

This slightly improves pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 490378kB in total, 2471963 objects / 140742kB still reachable (per service:  49 objs, 10042B alloc,  2882B in-use)
Avg: Allocated 501584kB in total, 2957195 objects / 177049kB still reachable (per service:  59 objs, 10272B alloc,  3625B in-use)
Max: Allocated 543366kB in total, 3722287 objects / 227653kB still reachable (per service:  74 objs, 11128B alloc,  4662B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

Signed-off-by: Tobias Klauser <[email protected]>
This simplifies call sites and avoids an unnecessary conversion in
the reconciler's (*BPFOps).pruneServiceMaps method.

pkg/loadbalancer/benchmark results:

Before:
Memory statistics from N=10 iterations:
Min: Allocated 489191kB in total, 2288754 objects / 127143kB still reachable (per service:  45 objs, 10018B alloc,  2603B in-use)
Avg: Allocated 500813kB in total, 2854354 objects / 168955kB still reachable (per service:  57 objs, 10256B alloc,  3460B in-use)
Max: Allocated 543044kB in total, 3721588 objects / 227563kB still reachable (per service:  74 objs, 11121B alloc,  4660B in-use)

After:
Memory statistics from N=10 iterations:
Min: Allocated 487023kB in total, 1944085 objects /  99675kB still reachable (per service:  38 objs,  9974B alloc,  2041B in-use)
Avg: Allocated 498657kB in total, 2813561 objects / 166829kB still reachable (per service:  56 objs, 10212B alloc,  3416B in-use)
Max: Allocated 542794kB in total, 3722246 objects / 227646kB still reachable (per service:  74 objs, 11116B alloc,  4662B in-use)

Signed-off-by: Tobias Klauser <[email protected]>
The DeleteLB{4,6}By* methods don't have callers anymore since
commit 6fa7f81 ("loadbalancer/legacy: Remove the old
control-plane"). Remove them.

Signed-off-by: Tobias Klauser <[email protected]>
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
Pull in the latest theme with newer docsearch plugin version.

Signed-off-by: Joe Stringer <[email protected]>
Signed-off-by: Cilium Imagebot <[email protected]>
Use the NativeEndian native-endian var provided by the Go standard
library encoding/binary package instead of the version from the
netlink/nl package.

While at it, also check the length of the handle returned by
unix.NamtToHandleAt before accessing it.

Follow-up to commit 2c1b49c ("byteorder: use binary.NativeEndian")

Signed-off-by: Tobias Klauser <[email protected]>
This commit fixes a cilium-agent panic during datapath reinitialization
when a DirectRouting device is required but not configured.
This can happen when the direct routing device drops, for example during
networkd restart.

```
time=2026-01-14T07:39:46.444386888Z level=info msg="Devices changed" module=agent.datapath.devices-controller devices=[]
time=2026-01-14T07:39:46.444654289Z level=info msg="Fallback node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.44474159Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="127.0.0.1 (primary), ::1 (primary)" device=*
time=2026-01-14T07:39:46.444833191Z level=info msg="Node addresses updated" module=agent.datapath.node-address addresses="" device=eth0
panic: runtime error: index out of range [3] with length 0

goroutine 415 [running]:
github.com/cilium/cilium/pkg/byteorder.NetIPv4ToHost32({0x0?, 0xc000e9e5d0?, 0x49e07bb?})
        /go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:15 +0x65
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0xc0004280e0, {0x7ff1517a6ba8, 0xc0023ec400}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:150 +0xa4b
github.com/cilium/cilium/pkg/datapath/loader.hashDatapath({0x50fbfb0, 0xc0004280e0}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/hash.go:20 +0x9e
github.com/cilium/cilium/pkg/datapath/loader.(*objectCache).UpdateDatapathHash(0xc001d027d0, 0xc001422870?)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/cache.go:62 +0x4d
github.com/cilium/cilium/pkg/datapath/loader.(*loader).Reinitialize(0xc002573580, {0x50f9c98, 0xc0008474d0}, 0xc001d00508, {{0x49d15b6, 0x4}, {0x0, 0x0}, 0x0, 0x0, ...}, ...)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:377 +0x3c8
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reinitialize(0xc001d36288, {0x50f9c98?, 0xc0008474d0?}, {{0x0?, 0x0?}, 0x0?}, 0xc001d00508)
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:275 +0x110
github.com/cilium/cilium/pkg/datapath/orchestrator.(*orchestrator).reconciler(0xc001d36288, {0x50f9c98, 0xc0008474d0}, {0x5104260, 0xc002feafc0})
        /go/src/github.com/cilium/cilium/pkg/datapath/orchestrator/orchestrator.go:219 +0x6fd
github.com/cilium/hive/job.(*jobOneShot).start(0xc002082e40, {0x50f9c98, 0xc0008474d0}, 0xc00143dce4?, {0x5104260, 0xc002082de0}, {{{0x0, 0x0, 0x0}}, 0xc001791770, ...})
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*queuedJob).Start.func1 in goroutine 1
        /go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:126 +0x16f
```

With the change in this commit when a direct routing device is not found
datapath orchestrator will log a warning and wait for device updates in
the reconciliation loop, skipping reinitialization.

Fixes 8fae439 ("datapath: move DirectRoutingDevice validation to orchestrator")

Signed-off-by: Deepesh Pathak <[email protected]>
Commit 8544737 ("bpf: Workaround for netkit + L7 policy redirect failure")
introduced the enable_netkit load-time config variable.

This commit introduces that variable to BPF loader permutation testing
for bpf_lxc.

Signed-off-by: Alasdair McWilliam <[email protected]>
Add Linux kernel requirement for netkit to the System Requirements.

Signed-off-by: Alasdair McWilliam <[email protected]>
This helper is shared by ICMPv6 and ICMPv4 and will be imported by both
in future refactors.

Signed-off-by: Andrea Terzolo <[email protected]>
The goal is to make these functions reusable for the ICMPv6 policy
denial feature, so moving them to a shared icmp6.h file

Signed-off-by: Andrea Terzolo <[email protected]>
This updates the Gateway API conformance test Make target
to make it not require any extra setup when run as part
of local development with Kind, and adds the ability
to set which tests to run, to allow for focussed
conformance test runs.

Signed-off-by: Nick Young <[email protected]>
DECLARE_CONFIG and NODE_CONFIG only differ in the value of their
respective kind: tag. Avoid code duplication by moving the common parts
to a separate DECLARE_CONFIG_KIND macro and use it to define
DECLARE_CONFIG and NODE_CONFIG. This also allows easier downstream use
of these macros.

Signed-off-by: Tobias Klauser <[email protected]>
Move events map rate/burst limits into node config and read them via
CONFIG(events_map_{rate,burst}_limit) in BPF helpers. This drops the
compile-time EVENTS_MAP* defines from the header writer and cleans up
the legacy defaults in bpf/node_config.h.

Signed-off-by: viktor-kurchenko <[email protected]>
Previously, sysdump collected Hubble UI and Hubble Relay deployments
using hardcoded deployment names ('hubble-ui', 'hubble-relay'). This
caused the collection to fail when users deployed Hubble components
with custom names (e.g., 'hubble-ui-blue'), even when the correct
labels were provided via --hubble-ui-labels or --hubble-relay-labels.

This change updates the deployment collection to use ListDeployment
with the configured label selectors instead of GetDeployment with
hardcoded names. This makes deployment collection consistent with
pod log collection, which already uses label selectors.

Fixes the issue where:
  cilium sysdump --hubble-ui-labels k8s-app=hubble-ui-blue
would still fail with 'Deployment hubble-ui not found'.

Signed-off-by: darox <[email protected]>
There's no code that uses the IPv4 header afterwards.

Signed-off-by: Julian Wiedmann <[email protected]>
Using fib_ok() to evaluate the result of fib_redirect_v*() is a bit
awkward. We're in TC context, so we know that CTX_ACT_TX doesn't need to
be handled (and would most likely lead to a packet loop).

By structuring the code as a switch() statement we can also clean up one
of the goto paths.

Signed-off-by: Julian Wiedmann <[email protected]>
Return the generic value, so that readers understand what macro they should
be using when handling the result.

Signed-off-by: Julian Wiedmann <[email protected]>
This commit moves NAT 46x64 RFC6052 prefix bytes into the node
configuration so BPF programs consume CONFIG(nat_46x64_prefix) value
instead of C header defines.
The Go datapath now populates this field from the NAT46x64 config, and
the headerfile writer no longer emits NAT_46X64_PREFIX_ defines.

Updates included:

- BPF nat_46x64 helpers switched to CONFIG(nat_46x64_prefix).
- Node config population moved to runtime config and legacy defines
  dropped.

Signed-off-by: viktor-kurchenko <[email protected]>
The flag is not used by IPsec. L2 neighbor discovery is enabled whenever
XDP is enabled. This flag allows users to enable it even if XDP is
disabled, so let's state that instead of mentioning IPsec.

Co-authored-by: Dylan Reimerink <[email protected]>
Signed-off-by: Dylan Reimerink <[email protected]>
Signed-off-by: Paul Chaignon <[email protected]>
Some of the operator's subsystem are currently only covered by the
catch-all rule assinging @cilium/operator. Instead, assign more specific
teams for certain subsystems which require more in-depth knowledge in
these particular areas.

Signed-off-by: Tobias Klauser <[email protected]>
…leanup leaked IAM roles

When multiple parallel jobs generate cluster names within the same
second, they can produce identical names since the timestamp has only
1-second precision. This causes CloudFormation stack creation to fail
with "AlreadyExistsException", leaving orphaned IAM roles behind.

This commit adds a random suffix to cluster names to prevent race conditions
and enhances the failure cleanup step to delete CloudFormation stacks and orphaned
IAM roles when cluster creation fails

Signed-off-by: André Martins <[email protected]>
This message was modified in k8s 1.35.0, therefore we should update the
list of messages that can be ignored in our CI.

Signed-off-by: André Martins <[email protected]>
MrFreezeex and others added 25 commits February 24, 2026 14:20
The cleanup logic was relying on the informer to list EndpointSlice to
delete which is racy since the informer is updated asynchronously/separately
through its own watch rather than acting as a write through cache.

The cleanup logic which is invoked on a cluster removal is modified in
this commit to thus directly rely on the client instead. This could
was actually reproducible with the existing endpointslicesync test at small
rate (~0.1%).

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>
Add circuit breaker configuration to both the egress and ingress Envoy clusters to limit retry attempts.

Signed-off-by: Liyi Huang <[email protected]>
While we have cilium#43049 to cover
embedded case. This PR is to cover the external envoy use case to use
clusterMaxRequests and clusterMaxConnections

Signed-off-by: Liyi Huang <[email protected]>
The context is unused, remove it.

Signed-off-by: Tobias Klauser <[email protected]>
Cilium looks if CILIUM_FEATURE_METRICS_WITH_DEFAULTS is set in the
local environment to determine if it should include default metric
values, which is useful for testing and generating documentation. The
was introduced in commit [0]  and is not intended to be used in
production scenarios.

The metrics package also includes host-specific version information,
such as Kubernetes and Linux kernel version information. This info
should not appear in published documents, and will cuses issues if we
want to automatically detect out-of-sync documents in CI.

This commit introduces a similar environment-based mechanism to hide
host-specific versions such as that noted above. If the new variable
CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION is set in the local env,
both the agent and the operator will disable version metrics.

Rationale for a new variable was to avoid overloading of the existing
variable, while following the existing precedent for modifying the
behaviour of metrics for development purposes. This also means existing
test mechanisms are unaffected.

The following commit will modify the relevant metrics to be disabled
if version metrics are to be excluded.

[0] 18fcd45 ("pkg/metrics: do not enable all metric defaults")

Signed-off-by: Alasdair McWilliam <[email protected]>
This commit builds on the previous commit by actively disabling the
control-plane Kubernetes version and data-path Linux kernel version
feature metrics when CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION is
set in the local environment.

This commit also updates the featuresParams and mockFeaturesParams
interfaces to include a helper function to get the Linux kernel version
as per the existing interface for the Kubernetes version.

Signed-off-by: Alasdair McWilliam <[email protected]>
This commit updates existing logic that tests the handling of Kubernetes
version in the operator, and also adds the same tests for the Linux
kernel version in the agent.

Signed-off-by: Alasdair McWilliam <[email protected]>
Fix the smoke test workflow to fail correctly if feature metric docs
are out of sync.

This has required the workflow to set the new environment variable
CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION when installing, to avoid CI
runner version information being added to the generated documentation
that is then used to compare diffs with. Otherwise, this will always
cause failures.

The variable is passed in extraEnv[3] to avoid conflicts with other
environment variables passed in actions/set-env-variables.

Fixes: eabba03 ("docs: document feature metrics and add rst generator")
Signed-off-by: Alasdair McWilliam <[email protected]>
Re-generate metrics documentation to synchronise with the current metrics
expressed by Cilium, which look to have been out of date for a while.

Signed-off-by: Alasdair McWilliam <[email protected]>
This change migrates CILIUM_NET_IFINDEX from a compile-time constant to
runtime configuration in the BPF layer, while keeping existing behavior
intact.

Signed-off-by: viktor-kurchenko <[email protected]>
This update moves CILIUM_HOST_IFINDEX from a fixed compile-time value to
a runtime-configured setting in the BPF layer while preserving existing
behavior.

Signed-off-by: viktor-kurchenko <[email protected]>
This change removes safenetlink from localnodeconfig and switches to
statedb, which already tracks interface information. It simplifies the
datapath control path by consolidating source-of-truth for network
interfaces and reduces dependency surface while keeping behavior aligned
with existing state data.

The orchestrator now waits for cilium_host and cilium_net to appear in
the devices table before entering the reconciliation loop, avoiding a
startup race with the devices controller. The wait uses device table
watches and the existing limiter to avoid spinning when watches are
unavailable.

Signed-off-by: viktor-kurchenko <[email protected]>
This change moves CILIUM_HOST_MAC from a compile-time constant into
runtime configuration for the BPF datapath while preserving existing
datapath behavior.

Signed-off-by: viktor-kurchenko <[email protected]>
This update migrates CILIUM_NET_MAC from a build-time constant to a
runtime-configured value in the BPF datapath while keeping datapath
behavior consistent.

Signed-off-by: viktor-kurchenko <[email protected]>
Ensure each tracked resource uses unique job name.
Duplicate job names are harder to debug, and produce
errors upon hive termination, such as:

```
msg="failed to delete reporter status tree" module=health
error="reporting for "bgp-control-plane.job-bgpcp-resource-store-events"
has been stopped
```

Signed-off-by: Rastislav Szabo <[email protected]>
When using prefix delegation mode, the CreateInterface function was
limiting the number of IPs to allocate based on the per-ENI secondary
IP limit (limits.IPv4-1). This caused the operator to make additional
API calls to AssignPrivateIpAddresses to allocate remaining prefixes.

This change removes the secondary IP limit restriction for prefix
delegation mode, allowing CreateNetworkInterface to request all needed
prefixes in a single API call. This reduces API calls and potential
race conditions during ENI creation.

Signed-off-by: Shiun Chiu <[email protected]>
When the special cil_lxc_policy_egress() tailcall program returns 0, make
it clear that this goes back to the kernel and handled as CTX_ACT_OK. And
is different from simple "return 0 for success" semantics.

Signed-off-by: Julian Wiedmann <[email protected]>
Migrate `CiliumInternalIPv4` and `CiliumInternalIPv6` from `net.IP` to `netip.Addr` in `LocalNodeConfiguration`.

Related: cilium#24246

Signed-off-by: Hadrien Patte <[email protected]>
Migrate `NodeIPv4` and `NodeIPv6` from `net.IP` to `netip.Addr` in `LocalNodeConfiguration`.

Related: cilium#24246

Signed-off-by: Hadrien Patte <[email protected]>
Migrate `ServiceLoopbackIPv4` and `ServiceLoopbackIPv6` from `net.IP` to `netip.Addr` in `LocalNodeConfiguration`.

Related: cilium#24246

Signed-off-by: Hadrien Patte <[email protected]>
Fixes: 0a7b40c ("k8s: update libraries to v1.35.0-rc.1")
Signed-off-by: Joe Stringer <[email protected]>
This commit fix the image-tools update. Theses images, located in
https://github.com/cilium/image-tools are tagged following a unix
timestamp format. Since Dec 2025, Renovate has a new option named
"maxMajorIncrement", which is set to 500 by default.

Setting the option to 0 allows infinite major increment that will work
with our timestamp format.

Renovate associated PR :
renovatebot/renovate#38854
Renovate associated issue :
renovatebot/renovate#20772

Signed-off-by: Antony Reynaud <[email protected]>
Move the already existing type used for tests to a dedicated
DesiredVLANDeviceSpec type so it can be used outside of the package.
Also implement the JSON/YAML marshalling method in terms of struct tags
rather than having to construct a map where all fields have to be
repeated.

Signed-off-by: Tobias Klauser <[email protected]>
@jiashengz jiashengz merged commit 87ad6ee into main Feb 25, 2026
@jiashengz jiashengz deleted the sync/upstream branch February 25, 2026 23:35
@jiashengz jiashengz restored the sync/upstream branch February 25, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.