Reduce allocs in ReadUint64 by pre-allocating byte buffer #278

manugupt1 · 2023-04-01T23:54:21Z

PR captures the reasoning behind this:
https://github.com/containerd/cgroups/pull/275/files#r1152562543

Reduce allocs in ReadUint64 by pre-allocating byte buffer

Before

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReaduint64$ github.com/containerd/cgroups/v3/cgroup1 -count=1 -timeout 30m

goos: linux
goarch: amd64
pkg: github.com/containerd/cgroups/v3/cgroup1
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkReaduint64-8             105118              9762 ns/op             848 B/op          5 allocs/op
PASS
ok      github.com/containerd/cgroups/v3/cgroup1        1.151s

After

Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReaduint64$ github.com/containerd/cgroups/v3/cgroup1 -count=1 -timeout 30m

goos: linux
goarch: amd64
pkg: github.com/containerd/cgroups/v3/cgroup1
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkReaduint64-8             141484              8322 ns/op             128 B/op          3 allocs/op
PASS
ok      github.com/containerd/cgroups/v3/cgroup1        1.274s

manugupt1 · 2023-04-02T03:39:40Z

Cc @fuweid @dcantah @AkihiroSuda

dcantah · 2023-04-02T08:04:19Z

cgroup1/utils.go

+	defer f.Close()
+	b := new(bytes.Buffer)
+	b.Reset()
+	_, err = io.Copy(b, f)
+	if err != nil {
+		return 0, err
+	}
+
+	return parseUint(strings.TrimSpace(b.String()), 10, 64)


Seems just using a raw byte slice cuts allocs down to 4 (although I get 7 allocs on my machine while testing your change instead of 6 as you posted). Doesn't need to be 80 bytes either, but > 64 might be necessary seeming as how this code took precautions to trim the string, so I'm assuming there's some cases where there may be trailing space in these files.

f, err := os.Open(path) if err != nil { return 0, err } defer f.Close() buf := make([]byte, 80) n, err := f.Read(buf) if err != nil { return 0, err } return parseUint(strings.TrimSpace(string(buf[:n])), 10, 64)

goos: linux goarch: amd64 pkg: github.com/containerd/cgroups/v3/cgroup1 cpu: AMD Ryzen 9 5900X 12-Core Processor BenchmarkReaduint64-24 849498 1342 ns/op 136 B/op 4 allocs/op PASS ok github.com/containerd/cgroups/v3/cgroup1 1.160s

yeah! It is significantly lower. I used buffer because of TrimSpace but adding extra bytes works as well.
I think our benchmarks are different as well (mine ends with -8 and yours with -24); maybe that is why allocs are different.

I think we can benchmark bytebuffer (mostly because I think it is better not to be too smart in guessing file lengths while reading from them) in places where we are doing Readfile in entire package and see if we can reduce allocations in follow up PRs.

That's fine with me

cgroup1/utils.go

dcantah

LGTM (just one suggestion)

dcantah · 2023-04-03T06:51:19Z

cgroup1/utils.go

 	}
-	return parseUint(strings.TrimSpace(string(v)), 10, 64)
+	defer f.Close()
+	b := make([]byte, 80)


Actually one more comment. Perhaps we should put a note above this for the reasoning behind 80 e.g. arbitrary value above 64 to account for trailing space (most I remember seeing was just one space but 🤷‍♂️), should fit on stack, extra 16 from 64 shouldn't hurt anyone etc.

PTAL, if the comment needs more modification. Can you add a suggestion/the entire wording; so that I can commit directly in. Also thanks for the quick reviews

cgroup1/utils_test.go

kolyshkin · 2023-04-04T22:42:58Z

Before

goos: linux goarch: amd64 pkg: github.com/containerd/cgroups/v3/cgroup1 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz BenchmarkPids-8 59758 19905 ns/op 2064 B/op 15 allocs/op

Not sure why you use BenchmarkPids before and BenchmarkReaduint64 after.

kolyshkin · 2023-04-04T22:50:44Z

Technically speaking, the benchmark should use a real cgroupfs file (or any other file from /proc). The reason is, normal files have size, and os.ReadFile uses that information to prepare the buffer of the size needed. OTOH many cgroupfs (and /proc) files report the size of 0, which makes os.ReadFile to allocate 512 bytes buffer.

I would recommend reading something like /proc/self/loginuid.

kolyshkin · 2023-04-04T23:22:05Z

I see number of allocations going down from 5 to 3, when using Go 1.20.2 (or 6 to 4 when using Go 1.19.6). This is far from savings reported in the PR description (which is obviously wrong as it says BenchmarkPid for "before" measurement).

Out of these 2 allocations, one comes from the f.Stat call from os.OpenFile. Indeed, we don't need stat() when working with cgroup files. The other, I guess, comes from inlining (if I move file reading to a separate function, the number of allocs goes up from 3 to 4).

So yes, the commit makes sense, the description though needs to be modified to

show the correct before and after measurements
tell that we avoid os.OpenFile because:
- it calls os.Stat which we do not need
- it pre-allocates 512 bytes while we're fine with 80

In addition, I would change 80 to 128 for alignment.

kolyshkin · 2023-04-04T23:28:00Z

Finally, note that there are a few other implementations in this repo, that are very similar to readUint. They all are in cgroup2 package, I was able to find two:

getStatFileContentUint64 (uses os.ReadFile)
readSingleFile (uses io.ReadAll)

I think those needs to be either consolidated, or fixed in a similar manner. Otherwise, we'll have the same problem in a couple of years once everyone has switched to cgroup v2.

dcantah · 2023-04-04T23:30:22Z

@kolyshkin Oop, you beat me to your last comment. I was drafting an issue for similar fixes we could make here. Have the same two you found so far, may be more.

dcantah · 2023-04-04T23:32:04Z

Technically speaking, the benchmark should use a real cgroupfs file (or any other file from /proc). The reason is, normal files have size, and os.ReadFile uses that information to prepare the buffer of the size needed. OTOH many cgroupfs (and /proc) files report the size of 0, which makes os.ReadFile to allocate 512 bytes buffer.

Probably is better to test on some magic file so +1, but given the change here is for us to provide the buffer, the ReadFile behavior doesn't come into the picture right?

manugupt1 · 2023-04-05T00:52:36Z

Yes to all of the above. I will do follow ups in coming days. I also noticed the same things.

😂

kolyshkin · 2023-04-05T01:41:40Z

Probably is better to test on some magic file so +1, but given the change here is for us to provide the buffer, the ReadFile behavior doesn't come into the picture right?

The ReadFile is used in the "before" benchmark.

dcantah · 2023-04-05T02:20:56Z

Probably is better to test on some magic file so +1, but given the change here is for us to provide the buffer, the ReadFile behavior doesn't come into the picture right?

The ReadFile is used in the "before" benchmark.

Ahh, I see what you were getting at now 👍.

manugupt1 · 2023-04-05T05:14:14Z

Sorry; I got mixed up while working. @kolyshkin is right! It is 5 --> 3

Before Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReaduint64$ github.com/containerd/cgroups/v3/cgroup1 -count=1 -timeout 30m goos: linux goarch: amd64 pkg: github.com/containerd/cgroups/v3/cgroup1 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz BenchmarkReaduint64-8 105118 9762 ns/op 848 B/op 5 allocs/op PASS ok github.com/containerd/cgroups/v3/cgroup1 1.151s After Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReaduint64$ github.com/containerd/cgroups/v3/cgroup1 -count=1 -timeout 30m goos: linux goarch: amd64 pkg: github.com/containerd/cgroups/v3/cgroup1 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz BenchmarkReaduint64-8 141484 8322 ns/op 128 B/op 3 allocs/op PASS ok github.com/containerd/cgroups/v3/cgroup1 1.274s Signed-off-by: Manu Gupta <[email protected]>

kolyshkin

LGTM, except we need to also address these: #278 (comment)

dcantah

LGTM. @kolyshkin, I was working on fixing the two cases we found for cg2 (which reminds me I should open my issue here..)

brancz · 2023-04-06T09:20:12Z

Ah awesome, I was going to get back to this today, glad you already worked it out! Re-running the benchmarks I now see a total improvement of ~25%!

$ benchstat old.txt new.txt
name          old time/op    new time/op    delta
Readuint64-8    5.19µs ± 2%    3.87µs ± 1%  -25.48%  (p=0.000 n=10+10)

name          old alloc/op   new alloc/op   delta
Readuint64-8      840B ± 0%      120B ± 0%  -85.71%  (p=0.000 n=10+10)

name          old allocs/op  new allocs/op  delta
Readuint64-8      5.00 ± 0%      3.00 ± 0%  -40.00%  (p=0.000 n=10+10)

That means we should reap a total improvement of about 4%! Awesome!

I'm going to go ahead and close my PR.

manugupt1 · 2023-04-07T05:11:13Z

Thanks for all the help @brancz and finding this issue.

fuweid

LGTM

manugupt1 force-pushed the without-sync branch 3 times, most recently from 04e821f to 213ea6a Compare April 1, 2023 23:58

manugupt1 changed the title ~~Have a byte buffer in ReadUint64 and use io.Copy to call into that buffer~~ Reduce allocs in ReadUint64 by pre-allocating byte buffer Apr 2, 2023

dcantah reviewed Apr 2, 2023

View reviewed changes

manugupt1 force-pushed the without-sync branch from 213ea6a to 6cd36ce Compare April 2, 2023 16:52

manugupt1 requested a review from dcantah April 2, 2023 16:54

dcantah reviewed Apr 3, 2023

View reviewed changes

cgroup1/utils.go Outdated Show resolved Hide resolved

dcantah approved these changes Apr 3, 2023

View reviewed changes

dcantah reviewed Apr 3, 2023

View reviewed changes

manugupt1 force-pushed the without-sync branch from 6cd36ce to 1113812 Compare April 4, 2023 03:25

manugupt1 requested a review from dcantah April 4, 2023 03:27

kolyshkin reviewed Apr 4, 2023

View reviewed changes

cgroup1/utils_test.go Outdated Show resolved Hide resolved

manugupt1 force-pushed the without-sync branch 2 times, most recently from 31cb059 to d17b8e1 Compare April 5, 2023 05:14

manugupt1 requested a review from kolyshkin April 5, 2023 05:14

manugupt1 force-pushed the without-sync branch from d17b8e1 to d2d0b18 Compare April 5, 2023 05:19

kolyshkin approved these changes Apr 5, 2023

View reviewed changes

dcantah approved these changes Apr 5, 2023

View reviewed changes

dcantah mentioned this pull request Apr 5, 2023

Reduce allocations for Stat in cg2 #280

Closed

brancz mentioned this pull request Apr 6, 2023

cgroup1: Reduce allocations when reading uints from files #275

Closed

fuweid approved these changes Apr 7, 2023

View reviewed changes

fuweid merged commit db40fde into containerd:main Apr 7, 2023

manugupt1 deleted the without-sync branch April 7, 2023 05:33

This was referenced Jun 29, 2023

go.mod: Update cgroups to 3.0.2 containerd/containerd#8762

Merged

[release/1.7] go.mod: Update cgroups to 3.0.2 containerd/containerd#8769

Merged

Reduce allocs in ReadUint64 by pre-allocating byte buffer #278

Reduce allocs in ReadUint64 by pre-allocating byte buffer #278

Uh oh!

Conversation

manugupt1 commented Apr 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manugupt1 commented Apr 2, 2023

Uh oh!

dcantah Apr 2, 2023

Choose a reason for hiding this comment

Uh oh!

dcantah Apr 2, 2023

Choose a reason for hiding this comment

Uh oh!

manugupt1 Apr 2, 2023

Choose a reason for hiding this comment

Uh oh!

manugupt1 Apr 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcantah Apr 3, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dcantah left a comment

Choose a reason for hiding this comment

Uh oh!

dcantah Apr 3, 2023

Choose a reason for hiding this comment

Uh oh!

manugupt1 Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kolyshkin commented Apr 4, 2023

Uh oh!

kolyshkin commented Apr 4, 2023

Uh oh!

kolyshkin commented Apr 4, 2023

Uh oh!

kolyshkin commented Apr 4, 2023

Uh oh!

dcantah commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcantah commented Apr 4, 2023

Uh oh!

manugupt1 commented Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kolyshkin commented Apr 5, 2023

Uh oh!

dcantah commented Apr 5, 2023

Uh oh!

manugupt1 commented Apr 5, 2023

Uh oh!

kolyshkin left a comment

Choose a reason for hiding this comment

Uh oh!

dcantah left a comment

Choose a reason for hiding this comment

Uh oh!

brancz commented Apr 6, 2023

Uh oh!

manugupt1 commented Apr 7, 2023

Uh oh!

fuweid left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

manugupt1 commented Apr 1, 2023 •

edited

Loading

manugupt1 Apr 2, 2023 •

edited

Loading

manugupt1 Apr 4, 2023 •

edited

Loading

dcantah commented Apr 4, 2023 •

edited

Loading

manugupt1 commented Apr 5, 2023 •

edited

Loading