Reduce short-lived allocations in memory maps parsing#331
Conversation
While profiling an application that uses procfs I noticed that most of
the short-lived allocations came from calling maps(). It seems like this
is happening because when `lines()` is called, every iterator call
returns an owned String rather than a string reference.
The CPU time only decreased ~3%. Modern memory allocators can be fast, but thought it could be still worth it as the allocation count is significantly reduced.
```Rust
for process in procfs::process::all_processes().unwrap() {
let Ok(proc) = process else {
continue
};
let maps = proc.maps()?;
}
```
```
$ sudo /usr/bin/hyperfine ./currentprocfs ./newprocfs --warmup 5
Benchmark 1: ./currentprocfs
Time (mean ± σ): 29.7 ms ± 0.3 ms [User: 12.4 ms, System: 17.2 ms]
Range (min … max): 28.4 ms … 30.5 ms 97 runs
Benchmark 2: ./newprocfs
Time (mean ± σ): 29.0 ms ± 0.3 ms [User: 11.8 ms, System: 17.0 ms]
Range (min … max): 27.9 ms … 30.4 ms 100 runs
Summary
./newprocfs ran
1.03 ± 0.02 times faster than ./currentprocfs
```
Test plan
=========
CI + ran my application with this patches for a little while without any
issues (and integration tests pass too). Correctly reading memory
mappings is at the core of the application I work on so any issues would
result in pretty bad failures.
|
If this optimisation is accepted, perhaps we could open issues for some of the other |
|
|
||
| let mut line_iter = reader.lines().map(|r| r.map_err(|_| ProcError::Incomplete(None))); | ||
| fn from_buf_read<R: BufRead>(mut reader: R) -> ProcResult<Self> { | ||
| let mut memory_maps = Vec::with_capacity(10); |
There was a problem hiding this comment.
This was an educate guess that I cross-checked with my dev box:
$ for maps in /proc/*/maps; do sudo cat $maps | wc -l; done 2>/dev/null | grep -v 0 | sort | uniq -c | sort -k2h
6 23
1 26
12 31
1 33
2 36
2 45
2 57
1 59
1 63
1 78
1 82
1 83
[...]| let mut memory_maps = Vec::with_capacity(10); | ||
| let mut current_memory_map: Option<MemoryMap> = None; | ||
| while let Some(line) = line_iter.next().transpose()? { | ||
| let mut line = String::with_capacity(100); |
There was a problem hiding this comment.
Based on:
len("55fa56b4f000-55fa56b51000 r--p 00000000 00:1f 5749885190 /usr/bin/cat")
85|
Hi @eminence. Let me know if there's anything you would like me to do. Happy to run anything else you consider to be missing. Thanks 😄 |
|
BTW, would really appreciate a release if you could cut one! |
|
Sure, I'll take a look this weekend. Thanks for the ping on this |
|
Looks good to me. I like this optimization because it doesn't really make the code any more complicated or harder to read. I don't mind changing other uses of |
|
Makes sense. Just to give you some additional context, I am using procfs in a profiler I am working on. I am trying to reduce its overhead as much as possible, and this was somehow a micro-optimisation as most memory allocators are very fast, but since it was the top source of short-lived allocations for my project, I went for it. I will abstain from submitting performance improvements unless it shows a significant decrease of CPU usage or allocations. |
|
To be clear: I think if the improvement is measurable, then it should be considered. The main thing I'd like to avoid is a flood of micro-optimizations PRs that don't show any measurable improvement. The hyperfine benchmark you included in this PR was perfect for showing a small, but measurable improvement. |
While profiling an application that uses procfs I noticed that most of the short-lived allocations came from calling maps(). It seems like this is happening because when
lines()is called, every iterator call returns an owned String rather than a string reference.The CPU time only decreased ~3%. Modern memory allocators can be fast, but thought it could be still worth it as the allocation count is significantly reduced.
Test plan
CI + ran my application with this patches for a little while without any issues (and integration tests pass too). Correctly reading memory mappings is at the core of the application I work on so any issues would result in pretty bad failures.