-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
cat: Performance improvement when printing line numbers #7645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Here's some numbers for the performance gains with this change... So ~70% faster than the current mainline implementation, and a tiny bit faster than the GNU implementation. If you pull in #7642 too we actually end up quite a bit quicker than GNU :) |
|
GNU testsuite comparison: |
d14f4a6 to
90828e6
Compare
|
GNU testsuite comparison: |
src/uu/cat/src/cat.rs
Outdated
| fn new() -> Self { | ||
| LineNumber { | ||
| // Initialize buf to b" 1\t" | ||
| buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could write this as:
| buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'], | |
| buf: Vec::from(b" 1\t"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I've made that change 👍
src/uu/cat/src/cat.rs
Outdated
| // If we hit anything other than a b'9' we can break since the next digit is | ||
| // unaffected. | ||
| // Also note that if we hit a b' ', we can think of that as a 0 and increment to b'1'. | ||
| // If/else is faster than match since we can prioritize most likely digits first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you benchmarked this? I think both would be fine here, but a match would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the match is nicer, but benchmarking shows a small benefit with the if-else logic.
Here's a benchmark result from my machine (cat => this code, cat.match => if-else replaced with a match)...
$ hyperfine -L cat ./target/release/cat,./target/release/cat.match "{cat} -n ./wikidatawiki-20240901-pages-logging27.xml"
Benchmark 1: ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml
Time (mean ± σ): 5.445 s ± 0.078 s [User: 4.827 s, System: 0.613 s]
Range (min … max): 5.351 s … 5.635 s 10 runs
Benchmark 2: ./target/release/cat.match -n ./wikidatawiki-20240901-pages-logging27.xml
Time (mean ± σ): 5.860 s ± 0.102 s [User: 5.239 s, System: 0.616 s]
Range (min … max): 5.722 s … 6.056 s 10 runs
Summary
./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml ran
1.08 ± 0.02 times faster than ./target/release/cat.match -n ./wikidatawiki-20240901-pages-logging27.xml
So it's a few% faster, worth the gain I think.
|
Very Cool! |
Many thanks for the review! |
90828e6 to
b6131f6
Compare
|
GNU testsuite comparison: |
| } | ||
|
|
||
| // Logic to store a string for the line number. Manually incrementing the value | ||
| // represented in a buffer like this is significantly faster than storing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very similar to the fast_inc function I have here: https://github.com/uutils/coreutils/pull/7564/files#diff-8ef9575cb5c2b35e15751308bf63d6dbf3ac217a383790f9355bc5e13ab2fdb9R272
I wonder if we should try to reuse that? (my function is a bit more generic as increment can be other values that aren't 1, so I'm not 100% sure how it'll do in terms of performance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to leave my code as-is for now. I've done some benchmarking and even a small change in the logic I've written has a measurable difference on the overall performance, so I think moving to a generic function would slow things down quite a bit.
Do you think you'll ever implement a specialization for the common-case of adding exactly 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, happy to play with this after both our PR are merged ,-)
I'm hoping that by forcing #[inline] the compiler will give us the specialized version for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I played with it anyway ,-P
Dirty commit here: 606655e gets within 1-2% of your implementation.
And 03a0481 gets 1% faster with an optimized version, but it's a bit unfortunate that we need to copy code (I can't see how we can avoid that). edit: bc689e5 ever more optimized...
I'll need to check what happens when I move the function to uucore (not sure yet to understand how rust does things w.r.t. compile/link...)
One difference is that I preallocate a larger buffer, so that the vector never needs to be grown (we'll never reach more than 1024 digits ,-P)
(again, I don't mean to block this PR, happy to look further after merge)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, for sure if there's a version in uucore that gives the same performance then let's use that 👍
Add a simple class to manually maintain a string representation of the line number for the `cat` application. Maintaing this string is much faster than converting a `usize` line-number variable to a string each time it's needed. Gives a significant performance improvement with -n and -b flags.
b6131f6 to
c56489e
Compare
|
GNU testsuite comparison: |
| // If we hit anything other than a b'9' we can break since the next digit is | ||
| // unaffected. | ||
| // Also note that if we hit a b' ', we can think of that as a 0 and increment to b'1'. | ||
| // If/else here is faster than match (as measured with some benchmarking Apr-2025), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, the assembly is indeed a bit different in main here:
https://godbolt.org/z/fr6crT441
Add a simple class to manually maintain a string representation of the line number for the
catapplication.Maintaing this string is much faster than converting a
usizeline-number variable to a string each time it's needed.Gives a significant performance improvement with
-nand-bflags.