Skip to content

CFP 33716: Packet Tracing#44

Merged
joestringer merged 1 commit intocilium:mainfrom
Bigdelle:bigdelle/cfp-distributed-packet-tracing
Aug 16, 2024
Merged

CFP 33716: Packet Tracing#44
joestringer merged 1 commit intocilium:mainfrom
Bigdelle:bigdelle/cfp-distributed-packet-tracing

Conversation

@Bigdelle
Copy link
Copy Markdown
Contributor

@Bigdelle Bigdelle commented Jul 12, 2024

Add PR for CFP: Packet Tracing.

@gandro
Copy link
Copy Markdown
Member

gandro commented Jul 17, 2024

cc @cilium/sig-hubble

@Bigdelle Bigdelle changed the title CFP 33716: Distributed Packet Tracing CFP 33716: Packet Tracing Jul 19, 2024
@Bigdelle Bigdelle closed this Jul 19, 2024
@Bigdelle Bigdelle deleted the bigdelle/cfp-distributed-packet-tracing branch July 19, 2024 19:04
@Bigdelle Bigdelle restored the bigdelle/cfp-distributed-packet-tracing branch July 19, 2024 19:04
@Bigdelle Bigdelle reopened this Jul 19, 2024
@Bigdelle Bigdelle force-pushed the bigdelle/cfp-distributed-packet-tracing branch from e4eaca2 to a426dda Compare July 19, 2024 20:03
@Bigdelle
Copy link
Copy Markdown
Contributor Author

cc @cilium/sig-datapath

@julianwiedmann
Copy link
Copy Markdown
Member

On a very high-level - you're proposing that the datapath matches some specific IP option in the packet (containing some trace_id), and transfers the trace_id value to userspace via the trace_notify struct ?

As an alternative thought - afaik the datapath is already punting some(?) packet headers to userspace. If you increase the number of punted bytes (so that it includes the IP options), would that allow you to move all the needed IP option-parsing into hubble?

@Bigdelle
Copy link
Copy Markdown
Contributor Author

Bigdelle commented Jul 24, 2024

On a very high-level - you're proposing that the datapath matches some specific IP option in the packet (containing some trace_id), and transfers the trace_id value to userspace via the trace_notify struct ?

As an alternative thought - afaik the datapath is already punting some(?) packet headers to userspace. If you increase the number of punted bytes (so that it includes the IP options), would that allow you to move all the needed IP option-parsing into hubble?

We want the ip_trace_id available at the Kernel level. This PR sets up the infrastructure to expand on this feature as we continue developing it; the next milestone (linked in the Future Milestones section of the CFP) involves restoring the ip_trace_id for reply packets using conntrack.

## Future Milestones

### Deferred Milestone 1: Use conntrack to restore trace ID for reply packets
- When a packet includes a trace ID, the conntrack entry should record the ID along with the connection information.
- For subsequent packets – if a trace ID is not present in the IP Options – use the track ID from conntrack to restore the internal state.
- Note this will not modify the IP option header.

This will ensure that subsequent packets in the same connection maintain their traceability even if they do not carry the ip_trace_id explicitly in their IP options. If we push the IP option parsing to Hubble, we lose this ability.

Also, by pushing this to Hubble, we lose integration with the cilium monitor for events. Considering the future customer use-cases of this feature, we want the ability to see the ip_trace_id in the monitor.

Is there a specific concern you have about including this in the trace_notify and drop_notify structs? (these fields would only be populated if the feature is enabled and there is an embedded ip_trace_id).

@Bigdelle Bigdelle closed this Jul 24, 2024
@Bigdelle Bigdelle reopened this Jul 24, 2024
Copy link
Copy Markdown
Member

@bowei bowei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments -- proposal looks great

@Bigdelle Bigdelle force-pushed the bigdelle/cfp-distributed-packet-tracing branch 3 times, most recently from 3b11a51 to 4aa290d Compare July 25, 2024 21:25
Copy link
Copy Markdown
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature proposal! It's interesting 😃

Just one question below on the verbose debug case.

@Bigdelle Bigdelle force-pushed the bigdelle/cfp-distributed-packet-tracing branch from 9915947 to 3809611 Compare August 13, 2024 16:42
@bowei
Copy link
Copy Markdown
Member

bowei commented Aug 13, 2024

CC: @sypakine

@Bigdelle Bigdelle requested a review from pchaigno August 13, 2024 23:10
@Bigdelle Bigdelle force-pushed the bigdelle/cfp-distributed-packet-tracing branch 2 times, most recently from d6cee8b to 5f359e8 Compare August 16, 2024 17:00
Copy link
Copy Markdown
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this proposal, it sounds useful. I think this PR mainly just needs the metadata at the top to be updated to reflect that this is now considered "Implementable".

Reading through I did wonder about a couple of points, but I think that those points are probably crossing over from the design aspect and more into the implementation aspect, which I don't think needs to be signed off on in order to merge the PR considering it "Implementable". I figured it's just worth highlighting those below since those aspects could potentially come up during implementation / code review.


### Section 4: Event integration

Include ID as event field for trace and drop structs and the flow message (for Hubble integration):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly this makes sense. We probably want to think about whether this uses a new trace_notify_v2 message type (and similarly new message type number + structure for drop). We've done this in the past at least, though it's probably worth digging into the reasoning why we did that to check whether it's still necessary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ This doesn't need to be resolved before merging the CFP as "Implementable". We can figure out those details on the PR.

Copy link
Copy Markdown
Contributor Author

@Bigdelle Bigdelle Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I updated the CFP with relevant changes, please let me know if there's anything else necessary before merging. Thank you!


Mitigation: This feature is a manually enabled debug option, giving users full control over performance overhead of IP option tagged packets. Users can manage overhead and performance concerns related to MTU, GRO compatibility, packet loss, and IP checksum recalculation. By making packet tagging the client's responsibility, Cilium remains unaffected by these concerns, allowing clients to incorporate these considerations into their use of the feature and determine the best way to utilize it according to their specific needs.

## Benchmark Testing
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the additional work over-and-beyond a CFP to put benchmarks in here :) Typically for a CFP this is not necessary, but since you've put it together I don't mind having this in here. Usually I'd expect this to go with the corresponding implementation PR rather than the design, but I know that the review for this design was slow this cycle and you've taken the time to develop the solution during that delayed design review.

@Bigdelle Bigdelle force-pushed the bigdelle/cfp-distributed-packet-tracing branch 4 times, most recently from f9e5108 to 07c78f9 Compare August 16, 2024 20:57
- Added CFP for packet tracing feature.

Signed-off-by: Ben Bigdelle <[email protected]>
@Bigdelle Bigdelle force-pushed the bigdelle/cfp-distributed-packet-tracing branch from 07c78f9 to e106e1c Compare August 16, 2024 21:05
@joestringer
Copy link
Copy Markdown
Member

I think this CFP is in a solid state for implementing. There's a couple of details that could maybe be hashed out a bit more, but I don't think they ultimately detract that much from the overall proposal, and may be impacted by the experience of implementing it. The proposal has been out plenty long enough for @cilium/sig-hubble and @cilium/sig-datapath to provide feedback, and there's now committers endorsing the proposal, so I think this is in a good enough state to merge.

As and when the implementation matures and impacts the design, we can always iterate further on this CFP, but we don't have to hold it as a PR during that period. Let's get this merged 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants