kallsyms: optimise parsing by alban · Pull Request #1584 · cilium/ebpf

alban · 2024-10-14T09:39:06Z

In tests with pprof in Inspektor Gadget, I see allocations getting reduced from 34MB to 12MB.

I also added a standalone benchmark.

tl;dr: 12% faster, 42% less allocations in bytes, 48% less allocations

$ go test -run=BenchmarkKernelModuleMemory -bench=. ./internal/kallsyms/... -count 10 | tee main.bench
$ go test -run=BenchmarkKernelModuleMemory -bench=. ./internal/kallsyms/... -count 10 | tee patched.bench
$ benchstat main.bench patched.bench
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/internal/kallsyms
cpu: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
                     │ main.bench  │            patched.bench            │
                     │   sec/op    │   sec/op     vs base                │
KernelModuleMemory-4   299.9m ± 3%   262.1m ± 6%  -12.59% (p=0.000 n=10)

                     │  main.bench  │            patched.bench             │
                     │     B/op     │     B/op      vs base                │
KernelModuleMemory-4   37.70Mi ± 0%   21.67Mi ± 0%  -42.53% (p=0.000 n=10)

                     │ main.bench  │            patched.bench            │
                     │  allocs/op  │  allocs/op   vs base                │
KernelModuleMemory-4   432.1k ± 0%   221.9k ± 0%  -48.64% (p=0.000 n=10)

xref inspektor-gadget/inspektor-gadget#3571 (comment)

cc @burak-ok

florianl

great optimization! 👍

burak-ok

Thank you Alban for making it less memory hungry and faster :)

burak-ok · 2024-10-14T20:11:54Z

internal/kallsyms/kallsyms.go

-		fields := bytes.Fields(scanner.Bytes())
+		line := scanner.Bytes()
+		// bytes.Fields is expensive, so filter for '[' first
+		if !bytes.ContainsRune(line, '[') {


I am not that sure (since I don't know the format that well), but if the closing bracket ] is always at the end of the line we could utilize that to improve performance even further

Do you mean replacing bytes.ContainsRune by something like this?

if strings.HasSuffix(line, "]")

That should filter the same amount of lines as bytes.ContainsRune, so it's a matter of if strings.HasSuffix is faster than bytes.ContainsRune. I have not tested it but I suspect it is similar for short lines like the ones in kallsyms, and in any cases would not change the amount of memory allocations.

The printfs for /proc/kallsyms are implemented here:
https://github.com/torvalds/linux/blob/eca631b8fe808748d7585059c4307005ca5c5820/kernel/kallsyms.c#L446-L465

The last change was 3 years ago to add the buildid inside the brackets:
torvalds/linux@9294523#diff-9538b26d3e082f233e6adac664cd2c14cbf2d510d5d7f286eef329c58de87eadR451
This is only for kernels with CONFIG_STACKTRACE_BUILD_ID (which is not set for my kernel). But the way cilium/ebpf parses the line is fine with regards to that change.

So I would say the format seems more or less stable but can change. And testing "contains-[" should be more stable than testing "suffix-]", in case Linux adds new fields at the end.

burak-ok · 2024-10-15T10:14:54Z

so it's a matter of if strings.HasSuffix is faster than bytes.ContainsRune

That is what I meant. Instead of looping through every rune in the string it only needs to check the last one.

And testing "contains-[" should be more stable than testing "suffix-]", in case Linux adds new fields at the end.

That is what I had feared

ti-mo · 2024-10-15T14:13:14Z

This is a nice band-aid, but conflicts with what #1578 is trying to do. Reading this PR here made me realize the proposal in #1578 is even slower and more expensive, that will need to be improved, and we shouldn't have multiple ways of scanning kallsyms.

As an experiment, I wrote a naive barebones line parser for kallsyms that calls bytes.IndexAny(line, " ") repeatedly and re-slices the line:

func parseLine(line []byte) (addr uint64, name string, mod string, error)

I got your results down to this, basically cutting down another 3/4 of the size and number of allocations:

BenchmarkKernelModuleMemory-16    	       5	 220576087 ns/op	13158904 B/op	  148693 allocs/op

Note: this similarly skips lines containing [ and looks for tT symbol types only.

Perhaps we can write a simple kallsyms iterator on top of bufio.ReadLine (not using bufio.Scanner() since it allocates a ton). I'm not sure if we need to deal with utf-8 for splitting on space and \t. Having one iterator that can jump to the next line or word boundary would be ideal.

func (s *Scanner) NextLine() bool
func (s *Scanner) NextWord() bool

// These only return the current word, never a full line.
func (s *Scanner) WordBytes() []byte
func (s *Scanner) WordText() string

This avoids having to allocate an individual bufio.ScanWords scanner around a bytes.Reader for every line, since we have many short lines we need to parse into a (uint64, string) or (uint64, string, string) if it involved a kmod. If we need to filter out certain symbol types, we can call s.NextLine() if we're not interested in the remainder of the line. What we need to avoid is allocating slices and letting them escape to the heap, which is something bytes.Fields does in spades. Returning [][]byte should always result in 4 or 5 slice headers and a backing array for the outer slice being put on the heap.

In tests with pprof in Inspektor Gadget, I see allocations getting reduced from 34MB to 12MB. I also added a standalone benchmark. tl;dr: 12% faster, 42% less allocations in bytes, 48% less allocations $ go test -run=BenchmarkKernelModuleMemory -bench=. ./internal/kallsyms/... -count 10 | tee main.bench $ go test -run=BenchmarkKernelModuleMemory -bench=. ./internal/kallsyms/... -count 10 | tee patched.bench $ benchstat main.bench patched.bench goos: linux goarch: amd64 pkg: github.com/cilium/ebpf/internal/kallsyms cpu: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz │ main.bench │ patched.bench │ │ sec/op │ sec/op vs base │ KernelModuleMemory-4 299.5m ± 9% 262.3m ± 11% -12.44% (p=0.000 n=10) │ main.bench │ patched.bench │ │ B/op │ B/op vs base │ KernelModuleMemory-4 37.70Mi ± 0% 21.67Mi ± 0% -42.53% (p=0.000 n=10) │ main.bench │ patched.bench │ │ allocs/op │ allocs/op vs base │ KernelModuleMemory-4 432.1k ± 0% 221.9k ± 0% -48.64% (p=0.000 n=10) Signed-off-by: Alban Crequy <[email protected]>

alban · 2024-10-16T13:24:02Z

I rebased my PR to fix the conflict with #1578.

Your idea of custom scanner/iterator looks interesting. Is it something you want to write after this PR gets merged?

ti-mo · 2024-10-17T09:04:10Z

@alban Closing this in favor of #1588. It reduces allocs more or less to the same degree.

alban requested a review from a team as a code owner October 14, 2024 09:39

This was referenced Oct 14, 2024

Flush memory allocations when no longer needed inspektor-gadget/inspektor-gadget#3571

Closed

prog: don't search for kmod when not needed #1585

Closed

cilium/ebpf memory optimizations inspektor-gadget/inspektor-gadget#3583

Closed

florianl approved these changes Oct 14, 2024

View reviewed changes

burak-ok approved these changes Oct 14, 2024

View reviewed changes

ti-mo mentioned this pull request Oct 15, 2024

feat: add support for untyped extern __ksym variables #1578

Merged

alban force-pushed the alban_kallsyms branch from 8a3afcf to 8489bd0 Compare October 16, 2024 13:16

ti-mo closed this Oct 17, 2024

alban mentioned this pull request Oct 17, 2024

go.mod: update cilium/ebpf with some memory optimization inspektor-gadget/inspektor-gadget#3595

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

kallsyms: optimise parsing#1584

kallsyms: optimise parsing#1584
alban wants to merge 1 commit intocilium:mainfrom
alban:alban_kallsyms

alban commented Oct 14, 2024 •

edited

Loading

Uh oh!

florianl left a comment

Uh oh!

burak-ok left a comment

Uh oh!

burak-ok Oct 14, 2024

Uh oh!

alban Oct 15, 2024

Uh oh!

burak-ok commented Oct 15, 2024

Uh oh!

ti-mo commented Oct 15, 2024

Uh oh!

alban commented Oct 16, 2024

Uh oh!

ti-mo commented Oct 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

alban commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

florianl left a comment

Choose a reason for hiding this comment

Uh oh!

burak-ok left a comment

Choose a reason for hiding this comment

Uh oh!

burak-ok Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

alban Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

burak-ok commented Oct 15, 2024

Uh oh!

ti-mo commented Oct 15, 2024

Uh oh!

alban commented Oct 16, 2024

Uh oh!

ti-mo commented Oct 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alban commented Oct 14, 2024 •

edited

Loading