bpf: initial multicast support#29469
Conversation
learnitall
left a comment
There was a problem hiding this comment.
Quick question from the sig-api side of things.
ti-mo
left a comment
There was a problem hiding this comment.
Nice work! Left a few nits/questions.
be544f9 to
e736a34
Compare
learnitall
left a comment
There was a problem hiding this comment.
One small nit, then LGTM. Thanks!
e736a34 to
11bd632
Compare
learnitall
left a comment
There was a problem hiding this comment.
Looks good from sig-api, thanks!
11bd632 to
726194f
Compare
58972ff to
0fdb8c0
Compare
| | TTL_EXCEEDED | 196 | | | ||
| | NO_NODE_ID | 197 | | | ||
| | DROP_RATE_LIMITED | 198 | | | ||
| | IGMP_HANDLED | 199 | | |
There was a problem hiding this comment.
Hey, one last thing, I'm also claiming 199 for DROP_HOST_NOT_READY here: https://github.com/cilium/cilium/pull/29482/files#diff-78122dbc1bf6a53a08f02c035e071f869eaa123b52822d789b79c04ff1cbdf69R642. Want to claim 200 onwards for IGMP?
Looks like there's a conflict in the alignchecker as well.
There was a problem hiding this comment.
Sure, I can snag 200+ but PR may look a bit weird since it skips until yours is merged?
There was a problem hiding this comment.
On second thought here, i would prefer if who ever's PR is merged first deals with the number conflict. I don't think its great to merge a gap in the sequence into main with the idea that it will be filled at some other time. If the above pr merges first I have no problem handling the conflict.
There was a problem hiding this comment.
Yes, except mine is to be backported to 1.14 and 1.15, so if yours lands first the old versions will have a gap :)
ti-mo
left a comment
There was a problem hiding this comment.
Thanks for bearing with me! Left one last comment.
This commit adds the eBPF map used to implement the synthetic multicast feature. A `BPF_MAP_TYPE_HASH_OF_MAPS`, which employees a `BPF_MAP_TYPE_HASH` inner map, is added to the datapath. The outer eBPF map is keyed by IPv4 multicast group addresses in big endian format and the values are `BPF_MAP_TYPE_HASH` maps. The inner hash map associates IPv4 source addresses with their subscriber multicast metadata. Each key/value in the inner hash map is a subscriber of the owning multicast group. Signed-off-by: Louis DeLosSantos <[email protected]>
This commit introduces IGMPv3 detection and parsing. When bpf_lxc recognizes IGMP messages egressing the Pod we attempt to parse them. The parsing logic is as follows: 1. Determine if traffic is IGMP 2. Determine the IGMP message type 3. If the type is not a membership report simply drop it (for now) 4. Parse each group record in the membership report 5. For any group records which indicate a join add a subscriber to the multicast subscriber map, if it exists. Signed-off-by: Louis DeLosSantos <[email protected]>
This commit adds parsing of IGMPv2 messages in a similar fashion as IGMPv3 messages. Signed-off-by: Louis DeLosSantos <[email protected]>
This commit implements replication and delivery of multicast packets.
This commit also enables the Cilium datapath to access both `bpf_clone_redirect`
and `bpf_map_for_each_elem` helpers.
The datapath flow is illustrated below:
┌──────────────────────────────────────────┐
│ │
│ Sender │
│ ┌──────┐ ┌─────────┐ │
│ │ pod ├─────► bpf_lxc │ │
│ └──────┘ └────┬────┘ │
│ Local Receivers │ eBPF Replication │
│ ┌──────┐ ┌──────┐ │ and Redirection │
│ │ pod ◄─┤ veth ◄─┤(cil_from_container) │
│ └──────┘ └──────┘ │ ┌───────┐ │
│ ├─► vxlan │ │
│ ┌──────┐ ┌──────┐ │ └───┬───┘ │
│ │ pod ◄─┤ veth ◄─┘ │ │
│ └──────┘ └──────┘ ┌────┘ │
│ │ │
└─────────────────────┼────────────────────┘
│
┌─────────────────────┼────────────────────┐
│ │ │
│ ┌───▼───┐ │
│ │ vxlan │ │
│ └───┬───┘ │
│ Remote Receivers │ eBPF Replication │
│ ┌──────┐ ┌──────┐ │ and Redirection │
│ │ pod ◄─┤ veth ◄─┤ (from_overlay) │
│ └──────┘ └──────┘ │ │
│ │ │
│ ┌──────┐ ┌──────┐ │ │
│ │ pod ◄─┤ veth ◄─┘ │
│ └──────┘ └──────┘ │
│ │
└──────────────────────────────────────────┘
A multicast sender sends a multicast packet.
The sender's bpf_lxc program does a lookup in the multicast group map to
discover who has subscribed to the group.
The program then clones and redirects the packets to the subscriber's
ingress device on the host namespace.
If the subscriber is remote the packet is cloned and redirected to a
vxlan device for encapsulation.
Once the host stack forwards the vxlan encap'd packet to the receiving
vxlan device on the remote host a similar "clone and redirect" process
is performed once the vxlan driver decaps the packet.
Signed-off-by: Louis DeLosSantos <[email protected]>
0fdb8c0 to
08d0b99
Compare
|
/test |
|
Thank you for this support. Has this been released? I poked a bit around a few latest releases but can't find this commit. |
|
@ldelossa is this usable in its current state? We may want to revisit the |
|
@joestringer this is purely a datapath implementation with no control plane. Should we update the tag here currently? |
|
@ldelossa yeah we can do that. It was listed as major for the v1.16.0-pre.0, but if we keep it as "major" then it will be announced again during the v1.16.0 final release. If we don't think it's ready for consumption yet then reducing the release-note would make sense to me. Given that there's no current user-facing changes, I'd be inclined to set it to Thanks again for your efforts on this. The reduction in release note is purely intended to help inform our users about the available functionality and is no reflection on the effort required or achievements in this PR. I can see it's quite a complicated bit of work with many nuances so I am glad to have contributors like yourself actively working on solving this use case :-). |
|
Sounds good @joestringer ! |
This pull request introduces initial multicast support within Cilium's datapath.
This pull request is the first milestone for Multicast support and sets to accomplish a minimal viable implementation consisting of:
The MVP of Multicast has the following constraints:
Limits are for MVP purposes and subject to change or become configurable.
Architecture slides:
eBPF Multicast Datapath.pdf