[PoC] Introduce new flag SpecCacheDisabled & Parse only the requires BTF types#1755
[PoC] Introduce new flag SpecCacheDisabled & Parse only the requires BTF types#1755burak-ok wants to merge 1 commit intocilium:mainfrom
SpecCacheDisabled & Parse only the requires BTF types#1755Conversation
This flag disables the caching of the BTF which reduces the memory footprint. Furthermore this also only parses the needed symbols out of the BTF instead of reading and interpreting everything. Signed-off-by: Burak Ok <[email protected]> Co-authored-by: Alban Crequy <[email protected]>
|
Seems like vmlinux spec keeps being a problem! Just checking: when you say memory usage you mean heap at idle? Calling https://pkg.go.dev/github.com/cilium/ebpf/btf#FlushKernelSpec does not help? How would you determine which types to parse from vmlinux? |
Yes, that would help for having a lower heap usage after starting the program. But if one sets a low memory limit in a pod spec, one also needs to avoid high memory while initializing -> while loading every program. Furthermore with
For that we are reading the all |
|
@lmb To me it sounds like lazy-decoding could've been a better avenue to explore after all? Not sure how (in)feasible it is today, but iirc we had btf.Spec.Add() back in the day which was a blocker. Now we have btf.Builder, we could technically, hypothetically, make btf.Spec a querying layer over an encoded btf blob and only inflate what's queried, and cache the results to enable type comparisons. Or, implement comparers on all types if we don't want to cache anything. Seems like some (most?) users care more about keeping both resident and peak memory usage low rather than speed. |
|
My main concern is / was complexity of a lazy decoder. The whole "fixups" concept needs to be redone... I think you are right that peak usage seems more important. I see two avenues: there is some perf to be gained by not unmarshaling into an interface for rawType, I think. Two is the lazy decode you mentioned. I still have some old proof of concepts lying around, I'll push those somewhere. |
|
@burak-ok could you come up with a list of types which you most frequently need from vmlinux? That way we can add a benchmark we can start optimising against. Right now that benchmark is decoding all of vmlinux which isn't useful. |
|
I hope the following helps: Here is a list from a single program which gets loaded every time for Inspektor Gadget: A list of from multiple programs combined which gets loaded every time for Inspektor Gadget: And another list from multiple programs, which get loaded every time and 4 gadgets( |
Add a benchmark which replicates the types used by Inspektor Gadget for a common confiuration. See cilium#1755 (comment) Signed-off-by: Lorenz Bauer <[email protected]>
Add a benchmark which replicates the types used by Inspektor Gadget for a common confiuration. See cilium#1755 (comment) Signed-off-by: Lorenz Bauer <[email protected]>
Add a benchmark which replicates the types used by Inspektor Gadget for a common confiuration. See cilium#1755 (comment) Signed-off-by: Lorenz Bauer <[email protected]>
|
I added roughly the same benchmarks that you posted for this PR in the top post. I'll take a deeper look into your PR some time later, thanks for opening it.
With having memory limits I think this would be the best case scenario for us. |
Add a benchmark which replicates the types used by Inspektor Gadget for a common configuration. Also add a benchmark which explicitly iterates all types in vmlinux, which is similar to what pwru does. See cilium#1755 (comment) Signed-off-by: Lorenz Bauer <[email protected]>
Add a benchmark which replicates the types used by Inspektor Gadget for a common configuration. Also add a benchmark which explicitly iterates all types in vmlinux, which is similar to what pwru does. See cilium#1755 (comment) Signed-off-by: Lorenz Bauer <[email protected]>
Add a benchmark which replicates the types used by Inspektor Gadget for a common configuration. Also add a benchmark which explicitly iterates all types in vmlinux, which is similar to what pwru does. See #1755 (comment) Signed-off-by: Lorenz Bauer <[email protected]>
|
The new lazy BTF code is in. @burak-ok could you try the code and report back how much of a difference it makes? |
This version introduces a lot of improvements. It's worth mentioning this one cilium/ebpf#1755, which reduces memory usage by ~25MB. It also bumps go version to 1.23.
Based on #1589
This is a proof-of-concept. The code is not ready for merging but it shows it is possible to significantly reduce the memory consumption (by 27MB).
This PR aims to lower the memory footprint when using the
cilium/ebpflibrary. This is achieved in two ways:The tradeoff is for lowering the memory footprint is of course performance while loading eBPF programs etc...
In this PoC when using the new
SpecCacheDisabledflag (1) it will also automatically check which BTF types are needed and load/parse only those (2)Benchmarks:
ParseVmlinuxis always the base for both testsnew.txtInspektor gadget run is with the types from [PoC] Introduce new flagSpecCacheDisabled& Parse only the requires BTF types #1755new.txtParseVmlinuxrun is calledBenchmarkParseVmlinuxWithoutFilterin this PR