ebpf: replace binary.Read with binary.Decode in sysenc.Unmarshal#1713
Conversation
1730e57 to
fefaf4a
Compare
|
Looks good to me, thanks for your contribution! Can you add a test using |
343b82e to
87a8080
Compare
|
@lmb Thanks for the review! I've added |
Yes, I think that would be ideal! No need for a separate test, let's update the existing one to reflect the new situation. Thanks! |
dc9d6e6 to
7c794bb
Compare
|
Thank you for the review @ti-mo! I've applied the requested changes. Could you please take another look? |
Remove allocations when unmarshaling values by using the new binary.Decode API. Signed-off-by: Anton Kolesnikov <[email protected]> Signed-off-by: Anton Kolesnikov <[email protected]>
7c794bb to
d620a32
Compare
lmb
left a comment
There was a problem hiding this comment.
Thanks! Added a comment to the tests and squashed the commits.
Hello team,
First off, thanks for the amazing work on this project! We use it extensively at Grafana, and we truly appreciate the effort that goes into maintaining it.
This PR removes unnecessary allocations in
sysenc.Unmarshalcaused by thebinary.Readcall. We propose usingbinary.Decode(introduced in Go 1.23) as an alternative. While the change may appear minor, its impact can be substantial depending on the specific use case.In the flame graphs below, you can observe one of our services experiencing significant GC pressure, with around 30% of the CPU time spent in
runtime.scanobject. The heap allocations profile further highlights the issue, tracing it to thesysenc.Unmarshalcall within theebpf.(*Map).Lookup->ebpf.unmarshalPerCPUValuecall chain.CPU profile:

Heap allocs profile (alloc_objects):

The profiles were collected over the course of an hour from a fleet of observability agents using Pyroscope and the standard Go pprof endpoints (we also use cilium/ebpf for profiling, though not in this particular case).
Nearly half of all the allocations made by the program occur due to
binary.Readinsysenc.Unmarshal. Whilesysenc.Unmarshalis generally well-optimized and only resorts tobinary.Readas a last step, in our specific case, we miss out on most of these optimizations.The exact locations where allocations occur:
sync.Poold := &decoderis stack-allocated inbinary.DecodeBenchmarks confirm the presence of an allocation in
sysenc.Unmarshal. After applying the optimization from this PR, the benchmark results show zero allocations.Benchmark Results
Thank you for considering this change. Please let me know if you need any additional information or clarifications.