Skip to content

bpf: initial multicast support#29469

Merged
ldelossa merged 4 commits intocilium:mainfrom
ldelossa:ldelossa/multicast-datapath
Dec 20, 2023
Merged

bpf: initial multicast support#29469
ldelossa merged 4 commits intocilium:mainfrom
ldelossa:ldelossa/multicast-datapath

Conversation

@ldelossa
Copy link
Copy Markdown
Contributor

This pull request introduces initial multicast support within Cilium's datapath.

This pull request is the first milestone for Multicast support and sets to accomplish a minimal viable implementation consisting of:

  1. IGMPv2 and IGMPv3 join/leave parsing
  2. Multicast packet replication and delivery to local and remote subscribers (vxlan tunneling mode only)

The MVP of Multicast has the following constraints:

  1. Kernel version >= 5.13 (for access to necessary eBPF helpers)
  2. Kernel configuration CONFIG_IP_MULTICAST enabled
  3. Multicast group limit of 1024
  4. Multicast subscribers per group limit of 1024
  5. Max 24 group records in an IGMPv3 message

Limits are for MVP purposes and subject to change or become configurable.

Architecture slides:
eBPF Multicast Datapath.pdf

bpf: initial multicast datapath support

@ldelossa ldelossa requested review from a team as code owners November 29, 2023 03:41
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 29, 2023
@ldelossa ldelossa added the release-note/major This PR introduces major new functionality to Cilium. label Nov 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 29, 2023
Copy link
Copy Markdown
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question from the sig-api side of things.

Copy link
Copy Markdown
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Left a few nits/questions.

@aanm aanm added dont-merge/blocked Another PR must be merged before this one. dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs and removed dont-merge/blocked Another PR must be merged before this one. labels Dec 4, 2023
@ldelossa ldelossa force-pushed the ldelossa/multicast-datapath branch from be544f9 to e736a34 Compare December 11, 2023 18:10
@ldelossa ldelossa requested a review from a team as a code owner December 11, 2023 18:10
Copy link
Copy Markdown
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit, then LGTM. Thanks!

@ldelossa ldelossa force-pushed the ldelossa/multicast-datapath branch from e736a34 to 11bd632 Compare December 11, 2023 22:53
Copy link
Copy Markdown
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from sig-api, thanks!

@ldelossa ldelossa force-pushed the ldelossa/multicast-datapath branch from 11bd632 to 726194f Compare December 12, 2023 18:43
@ldelossa ldelossa force-pushed the ldelossa/multicast-datapath branch 3 times, most recently from 58972ff to 0fdb8c0 Compare December 19, 2023 18:38
@ldelossa ldelossa requested a review from ti-mo December 19, 2023 19:04
| TTL_EXCEEDED | 196 | |
| NO_NODE_ID | 197 | |
| DROP_RATE_LIMITED | 198 | |
| IGMP_HANDLED | 199 | |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, one last thing, I'm also claiming 199 for DROP_HOST_NOT_READY here: https://github.com/cilium/cilium/pull/29482/files#diff-78122dbc1bf6a53a08f02c035e071f869eaa123b52822d789b79c04ff1cbdf69R642. Want to claim 200 onwards for IGMP?

Looks like there's a conflict in the alignchecker as well.

Copy link
Copy Markdown
Contributor Author

@ldelossa ldelossa Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can snag 200+ but PR may look a bit weird since it skips until yours is merged?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought here, i would prefer if who ever's PR is merged first deals with the number conflict. I don't think its great to merge a gap in the sequence into main with the idea that it will be filled at some other time. If the above pr merges first I have no problem handling the conflict.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, except mine is to be backported to 1.14 and 1.15, so if yours lands first the old versions will have a gap :)

Copy link
Copy Markdown
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me! Left one last comment.

This commit adds the eBPF map used to implement the synthetic multicast
feature.

A `BPF_MAP_TYPE_HASH_OF_MAPS`, which employees a `BPF_MAP_TYPE_HASH`
inner map, is added to the datapath.

The outer eBPF map is keyed by IPv4 multicast group addresses in big
endian format and the values are `BPF_MAP_TYPE_HASH` maps.

The inner hash map associates IPv4 source addresses with their
subscriber multicast metadata.

Each key/value in the inner hash map is a subscriber of the owning
multicast group.

Signed-off-by: Louis DeLosSantos <[email protected]>
This commit introduces IGMPv3 detection and parsing.

When bpf_lxc recognizes IGMP messages egressing the Pod we attempt to
parse them.

The parsing logic is as follows:
1. Determine if traffic is IGMP
2. Determine the IGMP message type
3. If the type is not a membership report simply drop it (for now)
4. Parse each group record in the membership report
5. For any group records which indicate a join add a subscriber to the
   multicast subscriber map, if it exists.

Signed-off-by: Louis DeLosSantos <[email protected]>
This commit adds parsing of IGMPv2 messages in a similar fashion as
IGMPv3 messages.

Signed-off-by: Louis DeLosSantos <[email protected]>
This commit implements replication and delivery of multicast packets.

This commit also enables the Cilium datapath to access both `bpf_clone_redirect`
and `bpf_map_for_each_elem` helpers.

The datapath flow is illustrated below:

┌──────────────────────────────────────────┐
│                                          │
│  Sender                                  │
│  ┌──────┐     ┌─────────┐                │
│  │ pod  ├─────► bpf_lxc │                │
│  └──────┘     └────┬────┘                │
│  Local Receivers   │  eBPF Replication   │
│  ┌──────┐ ┌──────┐ │  and Redirection    │
│  │ pod  ◄─┤ veth ◄─┤(cil_from_container) │
│  └──────┘ └──────┘ │ ┌───────┐           │
│                    ├─► vxlan │           │
│  ┌──────┐ ┌──────┐ │ └───┬───┘           │
│  │ pod  ◄─┤ veth ◄─┘     │               │
│  └──────┘ └──────┘  ┌────┘               │
│                     │                    │
└─────────────────────┼────────────────────┘
                      │
┌─────────────────────┼────────────────────┐
│                     │                    │
│                 ┌───▼───┐                │
│                 │ vxlan │                │
│                 └───┬───┘                │
│   Remote Receivers  │  eBPF Replication  │
│   ┌──────┐ ┌──────┐ │  and Redirection   │
│   │ pod  ◄─┤ veth ◄─┤  (from_overlay)    │
│   └──────┘ └──────┘ │                    │
│                     │                    │
│   ┌──────┐ ┌──────┐ │                    │
│   │ pod  ◄─┤ veth ◄─┘                    │
│   └──────┘ └──────┘                      │
│                                          │
└──────────────────────────────────────────┘

A multicast sender sends a multicast packet.

The sender's bpf_lxc program does a lookup in the multicast group map to
discover who has subscribed to the group.

The program then clones and redirects the packets to the subscriber's
ingress device on the host namespace.

If the subscriber is remote the packet is cloned and redirected to a
vxlan device for encapsulation.

Once the host stack forwards the vxlan encap'd packet to the receiving
vxlan device on the remote host a similar "clone and redirect" process
is performed once the vxlan driver decaps the packet.

Signed-off-by: Louis DeLosSantos <[email protected]>
@ldelossa ldelossa force-pushed the ldelossa/multicast-datapath branch from 0fdb8c0 to 08d0b99 Compare December 20, 2023 15:46
@ldelossa
Copy link
Copy Markdown
Contributor Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 20, 2023
@ldelossa ldelossa added this pull request to the merge queue Dec 20, 2023
Merged via the queue into cilium:main with commit 2afcb61 Dec 20, 2023
@ldelossa ldelossa deleted the ldelossa/multicast-datapath branch December 20, 2023 20:28
@nakabonne
Copy link
Copy Markdown

nakabonne commented Feb 20, 2024

Thank you for this support. Has this been released? I poked a bit around a few latest releases but can't find this commit.

@joestringer
Copy link
Copy Markdown
Member

@ldelossa is this usable in its current state? We may want to revisit the release-note/major and lower that if this is purely the datapath component and further changes are necessary to support common user workflows.

@ldelossa
Copy link
Copy Markdown
Contributor Author

ldelossa commented Mar 7, 2024

@joestringer this is purely a datapath implementation with no control plane. Should we update the tag here currently?

@joestringer
Copy link
Copy Markdown
Member

joestringer commented Mar 7, 2024

@ldelossa yeah we can do that. It was listed as major for the v1.16.0-pre.0, but if we keep it as "major" then it will be announced again during the v1.16.0 final release. If we don't think it's ready for consumption yet then reducing the release-note would make sense to me. Given that there's no current user-facing changes, I'd be inclined to set it to release-note/misc since release-note/minor is intended for user-impacting changes. As/when we merge additional PRs to expose this functionality in a way that users can use, we can label those PRs as major and consider whether to reassess this PR's release note as well.

Thanks again for your efforts on this. The reduction in release note is purely intended to help inform our users about the available functionality and is no reflection on the effort required or achievements in this PR. I can see it's quite a complicated bit of work with many nuances so I am glad to have contributors like yourself actively working on solving this use case :-).

@joestringer joestringer added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. and removed release-note/major This PR introduces major new functionality to Cilium. labels Mar 7, 2024
@ldelossa
Copy link
Copy Markdown
Contributor Author

Sounds good @joestringer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants