Skip to content

fix(bpf) Initialize cilium_percpu_trace_id map via Hive Start hook#41886

Merged
julianwiedmann merged 1 commit intocilium:mainfrom
Bigdelle:trace-hive
Oct 21, 2025
Merged

fix(bpf) Initialize cilium_percpu_trace_id map via Hive Start hook#41886
julianwiedmann merged 1 commit intocilium:mainfrom
Bigdelle:trace-hive

Conversation

@Bigdelle
Copy link
Copy Markdown
Contributor

@Bigdelle Bigdelle commented Sep 24, 2025

Description

This PR resolves a race condition during agent startup that caused CI flakes when loading the cilium_percpu_trace_id BPF map.

As noted by this #41306 (comment), the original error seen in logs was:
error="loading eBPF collection into the kernel: map cilium_percpu_trace_id: pin map to /sys/fs/bpf/tc/globals/cilium_percpu_trace_id: file exists"

Cause

The cilium_percpu_trace_id map is defined with the LIBBPF_PIN_BY_NAME flag, making it a globally shared, pinned map, so because multiple datapath components start and load BPF programs concurrently during agent initialization, multiple goroutines were attempting to create/pin this BPF map simultaneously. The first attempt works, but all other attempts fail with the message linked above.

Fix

The map's initializeion is now done via Hive injection: A dedicated iptrace.Cell is registered in daemon/cmd/cells.go. This uses the OnStart hook to call OpenOrCreate() for the cilium_percpu_trace_id map.

Fixes: #42125

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 24, 2025
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Sep 24, 2025
This commit resolves a rare concurrency flake observed in CI during
agent startup when loading BPF collections.

The `cilium_percpu_trace_id` map is defined with `LIBBPF_PIN_BY_NAME`,
which means it must be initialized and pinned to the BPF filesystem only
once. Previously, this map was created as part of the generic BPF collection
loading mechanism, which could execute concurrently across multiple components
during agent initialization. This led to a race condition.

- The first thread successfully creates and **pins** the map (`/sys/fs/bpf/...`).
- Subsequent concurrent threads fail when trying to **re-pin** the map, resulting
in the `file exists` error and agent failure.

This fix:
- moves the map initialization to a Hive start hook
- adds testing for the new initialization

Signed-off-by: Ben Bigdelle <[email protected]>
@Bigdelle Bigdelle marked this pull request as ready for review September 24, 2025 20:55
@Bigdelle Bigdelle requested review from a team as code owners September 24, 2025 20:55
@pippolo84 pippolo84 added the release-note/misc This PR makes changes that have no direct user impact. label Sep 26, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 26, 2025
@pippolo84 pippolo84 added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Sep 26, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 26, 2025
Copy link
Copy Markdown
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks 💯

@pippolo84
Copy link
Copy Markdown
Member

/test

@tklauser tklauser added this pull request to the merge queue Oct 20, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 20, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 20, 2025
@julianwiedmann julianwiedmann added this pull request to the merge queue Oct 21, 2025
Merged via the queue into cilium:main with commit 54d2e13 Oct 21, 2025
79 checks passed
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

No open projects
Status: Released

4 participants