btf: lazy decoding by lmb · Pull Request #1763 · cilium/ebpf

lmb · 2025-04-23T17:10:01Z

Prompted by #1755 another stab at lazy decoding of BTF types. IterateVmlinux is roughly the pwru workload (IIRC) and InspectorGadget does what it says on the tin.

core: 1
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-1365U
                │  base.txt   │          eager-strings.txt           │
                │   sec/op    │   sec/op     vs base                 │
ParseVmlinux      80.59m ± 4%   26.95m ± 2%  -66.56% (p=0.002 n=6)
IterateVmlinux    193.7m ± 5%   151.8m ± 2%  -21.65% (p=0.002 n=6)
InspectorGadget   87.22m ± 1%   34.56m ± 2%  -60.37% (p=0.002 n=6)
BuildVmlinux      145.7m ± 4%
geomean           118.7m        52.09m       -53.00%               ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

                │   base.txt    │           eager-strings.txt           │
                │     B/op      │     B/op      vs base                 │
ParseVmlinux      24.427Mi ± 0%   9.970Mi ± 0%  -59.18% (p=0.002 n=6)
IterateVmlinux     52.51Mi ± 0%   34.71Mi ± 0%  -33.90% (p=0.002 n=6)
InspectorGadget    26.65Mi ± 0%   12.00Mi ± 0%  -54.97% (p=0.002 n=6)
BuildVmlinux       42.22Mi ± 0%
geomean            34.66Mi        16.07Mi       -50.47%               ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

                │  base.txt   │          eager-strings.txt           │
                │  allocs/op  │  allocs/op   vs base                 │
ParseVmlinux      271.9k ± 0%   146.1k ± 0%  -46.27% (p=0.002 n=6)
IterateVmlinux    398.9k ± 0%   272.9k ± 0%  -31.60% (p=0.002 n=6)
InspectorGadget   280.8k ± 0%   155.0k ± 0%  -44.80% (p=0.002 n=6)
BuildVmlinux      1.508k ± 0%
geomean           82.32k        183.5k       -41.24%               ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

lmb · 2025-04-23T17:10:22Z

@brb could you try this in pwru and see how bad things are?

Signed-off-by: Martynas Pumputis <[email protected]>

brb · 2025-04-28T09:15:24Z

@lmb Are you interested in perf benchmarks, or correctness? If the latter, then all good - cilium/pwru#548. Unfortunately, we do not have any benchmarking suites.

lmb · 2025-04-28T09:26:56Z

I added a crude benchmark which just iterates all of vmlinux to simulate pwru. Even that has gotten faster!

Add a benchmark which replicates the types used by Inspektor Gadget for a common configuration. Also add a benchmark which explicitly iterates all types in vmlinux, which is similar to what pwru does. See cilium#1755 (comment) Signed-off-by: Lorenz Bauer <[email protected]>

Stop relying on readAndInflateTypes by creating a full BTF blob in the test. Signed-off-by: Lorenz Bauer <[email protected]>

Signed-off-by: Lorenz Bauer <[email protected]>

mutableTypes is going to be removed when switching to lazy BTF parsing. Move the implementation of AnyTypesByName to Spec. Signed-off-by: Lorenz Bauer <[email protected]>

Lazy decoding of BTF means that iterating over a Spec may produce errors. The current iterator interface doesn't allow returning errors. Replace Iterate() with a real iterator which calls into Spec.TypeByID instead of going via an internal fast path. This also allows moving Datasec fixups in the future. Signed-off-by: Lorenz Bauer <[email protected]>

Some Datasec are not correctly populated when parsing BTF from an ELF file (vs. raw vmlinux BTF). To address this we currently do a full pass over all types to find Datasec that need fixing up. Instead, perform the fixup when retrieving a Type from a Spec. Spec loaded from a raw BTF blob are kept unchanged. Signed-off-by: Lorenz Bauer <[email protected]>

Move the implementation of readAndInflateTypes into a new file. No other changes intended. Signed-off-by: Lorenz Bauer <[email protected]>

Switch BTF decoding from operating on a reader to operating on a byte slice. This simplifies the code and removes the need for a bunch of intermediate buffers. It's also necessary to be able to lazily decode BTF in the future. Signed-off-by: Lorenz Bauer <[email protected]>

Break up readAndInflateTypes into a dedicated decoder type. Constructing the decoder does a single pass over the BTF which collects the offsets of all types in the blob. Signed-off-by: Lorenz Bauer <[email protected]>

Refactor the decoder so that it is possible to unmarshal one type at a time. The main challenges are: * Unmarshaling a type requires unmarshaling all descendants. This is solved by recursively inflating types until reaching a leaf. Care needs to be taken to avoid infinite recursion when dealing with self-referential types like linked list headers. The trick here is that we store a partially inflated Type in d.types so that later function calls return the cached value. The partially inflated Type is removed from d.types on error to avoid polluting the decoder with "broken" types. * Unmarshaling a type may require decoding declTags pointing at that type. This is handled by maintaining a lookaside index which maps type IDs to declTag typeIDs. * Unmarshaling a legacy bitfield also requires a lookaside data structure. Even though the decoder is now capable of lazy decoding we still construct a full []Type when parsing BTF to keep the diff manageable. Signed-off-by: Lorenz Bauer <[email protected]>

Use the lazy decoder to avoid decoding all types upfront. This requires adding an index from essentialName to TypeID to be able to implement AnyTypeByName. decoder has to be safe for concurrent use because we want to share the raw BTF blob and so on when copying a Spec. Signed-off-by: Lorenz Bauer <[email protected]>

It turns out that doing two passes over the encoded BTF is basically free. Use this approach to pre-calculate the correct size for the data structures required by the decoder. This reduces allocations and copying by a substantial amount. Signed-off-by: Lorenz Bauer <[email protected]>

We still construct full slices of various raw BTF types, only to discard them afterwards. Instead, decode only a single raw type and accumulate the inflated equivalents. Signed-off-by: Lorenz Bauer <[email protected]>

Appending to a nil slice allocates a slice with an 8 item capacity as of Go 1.24. This is wasteful since most entries will only ever need a single ID. Do some syntax gymnastics to avoid this waste. Signed-off-by: Lorenz Bauer <[email protected]>

dylandreimerink

🚀

lmb mentioned this pull request Apr 23, 2025

[PoC] Introduce new flag SpecCacheDisabled & Parse only the requires BTF types #1755

Closed

brb added a commit to cilium/pwru that referenced this pull request Apr 28, 2025

WIP: try out cilium/ebpf#1763

85b4503

Signed-off-by: Martynas Pumputis <[email protected]>

lmb force-pushed the btf-lazy-final-final-v2 branch from def29e7 to afa4ee7 Compare April 28, 2025 09:25

lmb force-pushed the btf-lazy-final-final-v2 branch 3 times, most recently from 3ba613d to f9c4803 Compare April 28, 2025 11:25

lmb marked this pull request as ready for review April 28, 2025 11:27

lmb requested review from a team and dylandreimerink as code owners April 28, 2025 11:27

lmb mentioned this pull request Apr 28, 2025

Load kernel BTF spec once when creating a new collection #1690

Closed

ti-mo mentioned this pull request Apr 29, 2025

btf: LoadSpecFromReader: add filter param to optimize memory #1589

Closed

lmb added 14 commits April 30, 2025 15:56

btf: use loadRawSpec in TestInflateLegacyBitfield

14c03b6

Stop relying on readAndInflateTypes by creating a full BTF blob in the test. Signed-off-by: Lorenz Bauer <[email protected]>

btf: remove parseBTF

4488c1c

Signed-off-by: Lorenz Bauer <[email protected]>

btf: move AnyTypesByName to Spec

54b8d05

mutableTypes is going to be removed when switching to lazy BTF parsing. Move the implementation of AnyTypesByName to Spec. Signed-off-by: Lorenz Bauer <[email protected]>

btf: move readAndInflateTypes

1799301

Move the implementation of readAndInflateTypes into a new file. No other changes intended. Signed-off-by: Lorenz Bauer <[email protected]>

btf: introduce decoder

2691e01

Break up readAndInflateTypes into a dedicated decoder type. Constructing the decoder does a single pass over the BTF which collects the offsets of all types in the blob. Signed-off-by: Lorenz Bauer <[email protected]>

btf: reduce memory usage by skipping intermediate slices

864015f

We still construct full slices of various raw BTF types, only to discard them afterwards. Instead, decode only a single raw type and accumulate the inflated equivalents. Signed-off-by: Lorenz Bauer <[email protected]>

btf: reduce memory usage of namedTypes

ef22608

Appending to a nil slice allocates a slice with an 8 item capacity as of Go 1.24. This is wasteful since most entries will only ever need a single ID. Do some syntax gymnastics to avoid this waste. Signed-off-by: Lorenz Bauer <[email protected]>

lmb force-pushed the btf-lazy-final-final-v2 branch from f9c4803 to ef22608 Compare April 30, 2025 14:56

dylandreimerink approved these changes May 1, 2025

View reviewed changes

lmb merged commit c03bd45 into cilium:main May 1, 2025
17 checks passed

lmb deleted the btf-lazy-final-final-v2 branch May 1, 2025 14:04

burak-ok mentioned this pull request May 2, 2025

Typo: BenchmarkInspectorGadget -> BenchmarkInspektorGadget #1773

Merged

burak-ok mentioned this pull request May 30, 2025

deps: Update to go-ebpf version with lower memory usage inspektor-gadget/inspektor-gadget#4541

Merged

alban mentioned this pull request Jun 18, 2025

socketenricher: Use CO-RE to make it resilient against changes inspektor-gadget/inspektor-gadget#4543

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

btf: lazy decoding#1763

btf: lazy decoding#1763
lmb merged 14 commits intocilium:mainfrom
lmb:btf-lazy-final-final-v2

lmb commented Apr 23, 2025 •

edited

Loading

Uh oh!

lmb commented Apr 23, 2025

Uh oh!

brb commented Apr 28, 2025

Uh oh!

lmb commented Apr 28, 2025

Uh oh!

dylandreimerink left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

lmb commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmb commented Apr 23, 2025

Uh oh!

brb commented Apr 28, 2025

Uh oh!

lmb commented Apr 28, 2025

Uh oh!

dylandreimerink left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lmb commented Apr 23, 2025 •

edited

Loading