Skip to content

bpf:hubble: update trace/drop notify for L2-less packets#37097

Merged
joestringer merged 4 commits intocilium:mainfrom
smagnani96:pr/bpf-l3packet-tracing
Mar 24, 2025
Merged

bpf:hubble: update trace/drop notify for L2-less packets#37097
joestringer merged 4 commits intocilium:mainfrom
smagnani96:pr/bpf-l3packet-tracing

Conversation

@smagnani96
Copy link
Copy Markdown
Contributor

@smagnani96 smagnani96 commented Jan 20, 2025

Currently we're mistakenly parsing packets sent from an L3 device, as our decoders in Hubble/Monitors assume Ethernet as the first layers of the packet. Therefore, the output looks something like:

-> crypto flow 0x0 , identity 53498->unknown state unknown ifindex cilium_wg0 orig-ip 0.0.0.0: 40:00:3f:01:53:89 -> 45:00:00:54:cd:c8 UnknownEthernetType

Let's fix that by borrowing a bit in the trace/drop notify event to tell whether the packet comes from an L3 device or not:

  1. if from L3 device (ex wireguard), then look at ctx->protocol to additionally tell whether IPv4/6 (we need these info to know which parser to use)
  2. if from L2 device, then use the default parser (starting from Ethernet layer).

The output after this patch should look as follows:

# IPv4
## Cilium Monitor
<- crypto flow 0x0 , identity 37044->unknown state unknown ifindex cilium_wg0 orig-ip 0.0.0.0: 10.244.1.88 -> 10.244.2.35 EchoRequest
-> crypto flow 0x0 , identity 54684->unknown state unknown ifindex cilium_wg0 orig-ip 0.0.0.0: 10.244.2.35 -> 10.244.1.88 EchoReply
## Hubble Observe
Jan 31 14:34:38.876: cilium-test-1/client-7d44dd948b-cjh5k (ID:37044) <> cilium-test-1/echo-other-node-79787668d5-xwg9k (ID:54684) from-crypto FORWARDED (ICMPv4 EchoRequest)
Jan 31 14:34:38.877: cilium-test-1/echo-other-node-79787668d5-xwg9k (ID:54684) <> cilium-test-1/client-7d44dd948b-cjh5k (ID:37044) to-crypto FORWARDED (ICMPv4 EchoReply)

# IPv6
## Cilium Monitor
<- crypto flow 0x0 , identity 37044->unknown state unknown ifindex cilium_wg0 orig-ip ::: fd00:10:244:1::3c72 -> fd00:10:244:2::ca3f EchoRequest
-> crypto flow 0xc0318447 , identity 54684->unknown state unknown ifindex cilium_wg0 orig-ip ::: fd00:10:244:2::ca3f -> fd00:10:244:1::3c72 EchoReply
## Hubble Observe
Jan 31 14:26:26.430: cilium-test-1/client-7d44dd948b-cjh5k (ID:37044) <> cilium-test-1/echo-other-node-79787668d5-xwg9k (ID:54684) from-crypto FORWARDED (ICMPv6 EchoRequest)
Jan 31 14:26:26.430: cilium-test-1/echo-other-node-79787668d5-xwg9k (ID:54684) <> cilium-test-1/client-7d44dd948b-cjh5k (ID:37044) to-crypto FORWARDED (ICMPv6 EchoReply)

Fixes: #37095

@smagnani96 smagnani96 added area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. dont-merge/preview-only Only for preview or testing, don't merge it. labels Jan 20, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 20, 2025
@smagnani96 smagnani96 added the release-note/misc This PR makes changes that have no direct user impact. label Jan 20, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 20, 2025
@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch from e1bb0d2 to 8880c82 Compare January 31, 2025 15:02
@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch from 8880c82 to 934c2fd Compare February 18, 2025 10:05
@smagnani96 smagnani96 removed the dont-merge/preview-only Only for preview or testing, don't merge it. label Feb 18, 2025
@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch from 934c2fd to a317f99 Compare February 18, 2025 10:20
@smagnani96 smagnani96 marked this pull request as ready for review February 18, 2025 14:01
@smagnani96 smagnani96 requested review from a team as code owners February 18, 2025 14:01
@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch from a317f99 to 01f79b3 Compare February 18, 2025 16:15
@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

@julianwiedmann julianwiedmann added the area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Feb 20, 2025
Copy link
Copy Markdown
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the update scenario? If hubble always upgrades before bpf loading, then it's fine.

@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

Copy link
Copy Markdown
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smagnani96 🙏

Some minor comments, the missing correct decoder selection for DropNotify in the Hubble parser, and a suggestion about drop notification versioning.

@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch 3 times, most recently from 67deeea to 536dfc0 Compare March 19, 2025 12:55
Copy link
Copy Markdown
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smagnani96 for the quick update, patch LGTM now 🎉

The logic duplication between the Hubble parser and monitor dissect is not ideal but still manageable. Long term we probably want to dedup it.

@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch from 536dfc0 to f0e5fe0 Compare March 19, 2025 18:58
@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch from f0e5fe0 to 1ad9f95 Compare March 20, 2025 16:18
@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

1 similar comment
@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch from 1ad9f95 to 0395191 Compare March 21, 2025 09:09
@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

GetConnectionInfo is not referenced nor used in any of our sources.
Public API should be `GetConnectionSummary`, which internally calls
`getConnectionInfoFromCache`.

Signed-off-by: Simone Magnani <[email protected]>
The packet decoder in our Cilium Monitor utility never checks for errors
returned from the DecodeLayers function. In Hubble, we do have identical
code, but with the additional checks on that returned error code.
We should try to keep these tool utilities as aligned as possible to avoid
misunderstanding: in case of a decode error, Cilium Monitor would still
try to use the decoded info.

Since v1.1.18, DecodeLayers returns a non-nil error for an empty packet.
Skip decoding and truncate layers to avoid accidental re-use.
See google/gopacket#846

Signed-off-by: Simone Magnani <[email protected]>
This patch enabled the correct decoding of trace notifications when
emitted from an L3 device (ex. Wireguard). Prior to this patch, our
Monitor/Hubble logic always attempted to decode the raw bytes starting
from the Ethernet Layer. This of course doesn't hold true for L3 devices,
where we never see the L2 layer.

Since we already have a flag field in the trace notification with some
unused padding, let's borrow a bit to indicate whether the packet is being
logged from an L3 device or not. This could have been also achieved from
Monitor/Hubble directly, since the notification even has the ifindex, but
it would have required to add logic to keep ifindexes related to L3
interfaces up-to-date in Hubble/Monitor. IMHO, (1) this might not scale
well, and (2) it is much clearer to have all info forwarded from the
datapath. I'm not concerned about performance implication here since
no new data is being forwarded in the trace event.

Signed-off-by: Simone Magnani <[email protected]>
Similarly as the previous commit, this patch enabled the correct decoding
of drop notifications when emitted from an L3 device (ex. Wireguard).

Since we don't have a flag field or padding in drop notification struct,
let's increment the version and add a `__u32` field to hold flags.
For now, we use 1 bit to indicate IPv6 and 1 bit to indicate whether the
packet is being dropped from a L3 Device. The other 30 bits are kept as
padding the preserve the struct alignment.

Signed-off-by: Simone Magnani <[email protected]>
@smagnani96 smagnani96 force-pushed the pr/bpf-l3packet-tracing branch from 0395191 to 58497fd Compare March 21, 2025 14:24
@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 22, 2025
@joestringer joestringer added this pull request to the merge queue Mar 24, 2025
Merged via the queue into cilium:main with commit 05192c2 Mar 24, 2025
67 checks passed
@julianwiedmann julianwiedmann added backport/author The backport will be carried out by the author of the PR. needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels Apr 30, 2025

#if defined(ENABLE_WIREGUARD) && (defined(IS_BPF_HOST) || defined(IS_BPF_WIREGUARD))
if (THIS_INTERFACE_IFINDEX == WG_IFINDEX) {
l3_dev = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smagnani96 just driving by - can we make this l3_dev = (ETH_HLEN == 0); instead? That would allow us to capture all types of L3 device, not just the wireguard interface.

@github-actions github-actions bot added the backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. label May 8, 2025
@julianwiedmann julianwiedmann removed the needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch label May 10, 2025
@smagnani96 smagnani96 deleted the pr/bpf-l3packet-tracing branch May 21, 2025 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. backport/author The backport will be carried out by the author of the PR. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bpf: tracing: hubble: erroneously parse of L2-less packets in trace/drop notifications

5 participants