Skip to content

Reduce short-lived allocations in memory maps parsing#331

Merged
eminence merged 1 commit intoeminence:masterfrom
javierhonduco:reduce-short-lived-allocations-parsing-memory-maps
Feb 8, 2025
Merged

Reduce short-lived allocations in memory maps parsing#331
eminence merged 1 commit intoeminence:masterfrom
javierhonduco:reduce-short-lived-allocations-parsing-memory-maps

Conversation

@javierhonduco
Copy link
Contributor

While profiling an application that uses procfs I noticed that most of the short-lived allocations came from calling maps(). It seems like this is happening because when lines() is called, every iterator call returns an owned String rather than a string reference.

The CPU time only decreased ~3%. Modern memory allocators can be fast, but thought it could be still worth it as the allocation count is significantly reduced.

    for process in procfs::process::all_processes().unwrap() {
        let Ok(proc) = process else {
            continue
        };
        let maps = proc.maps()?;
    }
$ sudo /usr/bin/hyperfine ./currentprocfs ./newprocfs --warmup 5
Benchmark 1: ./currentprocfs
  Time (mean ± σ):      29.7 ms ±   0.3 ms    [User: 12.4 ms, System: 17.2 ms]
  Range (min … max):    28.4 ms …  30.5 ms    97 runs

Benchmark 2: ./newprocfs
  Time (mean ± σ):      29.0 ms ±   0.3 ms    [User: 11.8 ms, System: 17.0 ms]
  Range (min … max):    27.9 ms …  30.4 ms    100 runs

Summary
  ./newprocfs ran
    1.03 ± 0.02 times faster than ./currentprocfs

Test plan

CI + ran my application with this patches for a little while without any issues (and integration tests pass too). Correctly reading memory mappings is at the core of the application I work on so any issues would result in pretty bad failures.

While profiling an application that uses procfs I noticed that most of
the short-lived allocations came from calling maps(). It seems like this
is happening because when `lines()` is called, every iterator call
returns an owned String rather than a string reference.

The CPU time only decreased ~3%. Modern memory allocators can be fast, but thought it could be still worth it as the allocation count is significantly reduced.

```Rust
    for process in procfs::process::all_processes().unwrap() {
        let Ok(proc) = process else {
            continue
        };
        let maps = proc.maps()?;
    }
```

```
$ sudo /usr/bin/hyperfine ./currentprocfs ./newprocfs --warmup 5
Benchmark 1: ./currentprocfs
  Time (mean ± σ):      29.7 ms ±   0.3 ms    [User: 12.4 ms, System: 17.2 ms]
  Range (min … max):    28.4 ms …  30.5 ms    97 runs

Benchmark 2: ./newprocfs
  Time (mean ± σ):      29.0 ms ±   0.3 ms    [User: 11.8 ms, System: 17.0 ms]
  Range (min … max):    27.9 ms …  30.4 ms    100 runs

Summary
  ./newprocfs ran
    1.03 ± 0.02 times faster than ./currentprocfs
```

Test plan
=========

CI + ran my application with this patches for a little while without any
issues (and integration tests pass too). Correctly reading memory
mappings is at the core of the application I work on so any issues would
result in pretty bad failures.
@javierhonduco
Copy link
Contributor Author

If this optimisation is accepted, perhaps we could open issues for some of the other lines()


let mut line_iter = reader.lines().map(|r| r.map_err(|_| ProcError::Incomplete(None)));
fn from_buf_read<R: BufRead>(mut reader: R) -> ProcResult<Self> {
let mut memory_maps = Vec::with_capacity(10);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an educate guess that I cross-checked with my dev box:

$ for maps in /proc/*/maps; do sudo cat $maps | wc -l; done 2>/dev/null | grep -v 0 | sort | uniq -c | sort -k2h
      6 23
      1 26
     12 31
      1 33
      2 36
      2 45
      2 57
      1 59
      1 63
      1 78
      1 82
      1 83
[...]

let mut memory_maps = Vec::with_capacity(10);
let mut current_memory_map: Option<MemoryMap> = None;
while let Some(line) = line_iter.next().transpose()? {
let mut line = String::with_capacity(100);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on:

len("55fa56b4f000-55fa56b51000 r--p 00000000 00:1f 5749885190                 /usr/bin/cat")
85

@javierhonduco
Copy link
Contributor Author

Hi @eminence. Let me know if there's anything you would like me to do. Happy to run anything else you consider to be missing. Thanks 😄

@javierhonduco
Copy link
Contributor Author

BTW, would really appreciate a release if you could cut one!

@eminence
Copy link
Owner

eminence commented Feb 6, 2025

Sure, I'll take a look this weekend. Thanks for the ping on this

@eminence
Copy link
Owner

eminence commented Feb 8, 2025

Looks good to me. I like this optimization because it doesn't really make the code any more complicated or harder to read. I don't mind changing other uses of .lines() but if possible I'd like to see that the change actually makes a difference.
Thanks!

@eminence eminence merged commit faee997 into eminence:master Feb 8, 2025
6 checks passed
@javierhonduco javierhonduco deleted the reduce-short-lived-allocations-parsing-memory-maps branch February 9, 2025 16:11
@javierhonduco
Copy link
Contributor Author

Makes sense. Just to give you some additional context, I am using procfs in a profiler I am working on. I am trying to reduce its overhead as much as possible, and this was somehow a micro-optimisation as most memory allocators are very fast, but since it was the top source of short-lived allocations for my project, I went for it.

I will abstain from submitting performance improvements unless it shows a significant decrease of CPU usage or allocations.

@eminence
Copy link
Owner

To be clear: I think if the improvement is measurable, then it should be considered. The main thing I'd like to avoid is a flood of micro-optimizations PRs that don't show any measurable improvement. The hyperfine benchmark you included in this PR was perfect for showing a small, but measurable improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants