-
Notifications
You must be signed in to change notification settings - Fork 249
Reduce allocs in ReadUint64 by pre-allocating byte buffer #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
04e821f to
213ea6a
Compare
cgroup1/utils.go
Outdated
| defer f.Close() | ||
| b := new(bytes.Buffer) | ||
| b.Reset() | ||
| _, err = io.Copy(b, f) | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
|
|
||
| return parseUint(strings.TrimSpace(b.String()), 10, 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems just using a raw byte slice cuts allocs down to 4 (although I get 7 allocs on my machine while testing your change instead of 6 as you posted). Doesn't need to be 80 bytes either, but > 64 might be necessary seeming as how this code took precautions to trim the string, so I'm assuming there's some cases where there may be trailing space in these files.
f, err := os.Open(path)
if err != nil {
return 0, err
}
defer f.Close()
buf := make([]byte, 80)
n, err := f.Read(buf)
if err != nil {
return 0, err
}
return parseUint(strings.TrimSpace(string(buf[:n])), 10, 64)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goos: linux
goarch: amd64
pkg: github.com/containerd/cgroups/v3/cgroup1
cpu: AMD Ryzen 9 5900X 12-Core Processor
BenchmarkReaduint64-24 849498 1342 ns/op 136 B/op 4 allocs/op
PASS
ok github.com/containerd/cgroups/v3/cgroup1 1.160sThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah! It is significantly lower. I used buffer because of TrimSpace but adding extra bytes works as well.
I think our benchmarks are different as well (mine ends with -8 and yours with -24); maybe that is why allocs are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can benchmark bytebuffer (mostly because I think it is better not to be too smart in guessing file lengths while reading from them) in places where we are doing Readfile in entire package and see if we can reduce allocations in follow up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine with me
dcantah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (just one suggestion)
cgroup1/utils.go
Outdated
| } | ||
| return parseUint(strings.TrimSpace(string(v)), 10, 64) | ||
| defer f.Close() | ||
| b := make([]byte, 80) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually one more comment. Perhaps we should put a note above this for the reasoning behind 80 e.g. arbitrary value above 64 to account for trailing space (most I remember seeing was just one space but 🤷♂️), should fit on stack, extra 16 from 64 shouldn't hurt anyone etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL, if the comment needs more modification. Can you add a suggestion/the entire wording; so that I can commit directly in. Also thanks for the quick reviews
Not sure why you use |
|
Technically speaking, the benchmark should use a real cgroupfs file (or any other file from /proc). The reason is, normal files have size, and I would recommend reading something like |
|
I see number of allocations going down from 5 to 3, when using Go 1.20.2 (or 6 to 4 when using Go 1.19.6). This is far from savings reported in the PR description (which is obviously wrong as it says Out of these 2 allocations, one comes from the So yes, the commit makes sense, the description though needs to be modified to
In addition, I would change 80 to 128 for alignment. |
|
Finally, note that there are a few other implementations in this repo, that are very similar to
I think those needs to be either consolidated, or fixed in a similar manner. Otherwise, we'll have the same problem in a couple of years once everyone has switched to cgroup v2. |
|
@kolyshkin Oop, you beat me to your last comment. I was drafting an issue for similar fixes we could make here. Have the same two you found so far, may be more. |
Probably is better to test on some magic file so +1, but given the change here is for us to provide the buffer, the |
|
Yes to all of the above. I will do follow ups in coming days. I also noticed the same things. 😂 |
The |
Ahh, I see what you were getting at now 👍. |
31cb059 to
d17b8e1
Compare
|
Sorry; I got mixed up while working. @kolyshkin is right! It is 5 --> 3 |
Before Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReaduint64$ github.com/containerd/cgroups/v3/cgroup1 -count=1 -timeout 30m goos: linux goarch: amd64 pkg: github.com/containerd/cgroups/v3/cgroup1 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz BenchmarkReaduint64-8 105118 9762 ns/op 848 B/op 5 allocs/op PASS ok github.com/containerd/cgroups/v3/cgroup1 1.151s After Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -bench ^BenchmarkReaduint64$ github.com/containerd/cgroups/v3/cgroup1 -count=1 -timeout 30m goos: linux goarch: amd64 pkg: github.com/containerd/cgroups/v3/cgroup1 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz BenchmarkReaduint64-8 141484 8322 ns/op 128 B/op 3 allocs/op PASS ok github.com/containerd/cgroups/v3/cgroup1 1.274s Signed-off-by: Manu Gupta <[email protected]>
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except we need to also address these: #278 (comment)
dcantah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @kolyshkin, I was working on fixing the two cases we found for cg2 (which reminds me I should open my issue here..)
|
Ah awesome, I was going to get back to this today, glad you already worked it out! Re-running the benchmarks I now see a total improvement of ~25%! That means we should reap a total improvement of about 4%! Awesome! I'm going to go ahead and close my PR. |
|
Thanks for all the help @brancz and finding this issue. |
fuweid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR captures the reasoning behind this:
https://github.com/containerd/cgroups/pull/275/files#r1152562543