Skip to content

Conversation

@karlmcdowall
Copy link
Contributor

@karlmcdowall karlmcdowall commented Apr 4, 2025

Add a simple class to manually maintain a string representation of the line number for the cat application.
Maintaing this string is much faster than converting a usize line-number variable to a string each time it's needed.

Gives a significant performance improvement with -n and -b flags.

@karlmcdowall
Copy link
Contributor Author

karlmcdowall commented Apr 4, 2025

Here's some numbers for the performance gains with this change...

$ hyperfine -L cat /usr/bin/cat,./target/release/cat.original,./target/release/cat "{cat} -n ./wikidatawiki-20240901-pages-logging27.xml"
Benchmark 1: /usr/bin/cat -n ./wikidatawiki-20240901-pages-logging27.xml
  Time (mean ± σ):      5.200 s ±  0.104 s    [User: 4.831 s, System: 0.365 s]
  Range (min … max):    5.103 s …  5.422 s    10 runs
 
Benchmark 2: ./target/release/cat.original -n ./wikidatawiki-20240901-pages-logging27.xml
  Time (mean ± σ):      8.844 s ±  0.111 s    [User: 8.254 s, System: 0.587 s]
  Range (min … max):    8.687 s …  9.007 s    10 runs
 
Benchmark 3: ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml
  Time (mean ± σ):      5.146 s ±  0.082 s    [User: 4.552 s, System: 0.594 s]
  Range (min … max):    5.038 s …  5.259 s    10 runs
 
Summary
  ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml ran
    1.01 ± 0.03 times faster than /usr/bin/cat -n ./wikidatawiki-20240901-pages-logging27.xml
    1.72 ± 0.03 times faster than ./target/release/cat.original -n ./wikidatawiki-20240901-pages-logging27.xml

So ~70% faster than the current mainline implementation, and a tiny bit faster than the GNU implementation.

If you pull in #7642 too we actually end up quite a bit quicker than GNU :)

@github-actions
Copy link

github-actions bot commented Apr 4, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@karlmcdowall karlmcdowall force-pushed the cat_line_number_formatting branch 2 times, most recently from d14f4a6 to 90828e6 Compare April 4, 2025 11:52
@github-actions
Copy link

github-actions bot commented Apr 4, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

fn new() -> Self {
LineNumber {
// Initialize buf to b" 1\t"
buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could write this as:

Suggested change
buf: vec![b' ', b' ', b' ', b' ', b' ', b'1', b'\t'],
buf: Vec::from(b" 1\t"),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I've made that change 👍

// If we hit anything other than a b'9' we can break since the next digit is
// unaffected.
// Also note that if we hit a b' ', we can think of that as a 0 and increment to b'1'.
// If/else is faster than match since we can prioritize most likely digits first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you benchmarked this? I think both would be fine here, but a match would be nice.

Copy link
Contributor Author

@karlmcdowall karlmcdowall Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the match is nicer, but benchmarking shows a small benefit with the if-else logic.
Here's a benchmark result from my machine (cat => this code, cat.match => if-else replaced with a match)...

$ hyperfine -L cat ./target/release/cat,./target/release/cat.match "{cat} -n ./wikidatawiki-20240901-pages-logging27.xml"
Benchmark 1: ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml
  Time (mean ± σ):      5.445 s ±  0.078 s    [User: 4.827 s, System: 0.613 s]
  Range (min … max):    5.351 s …  5.635 s    10 runs
 
Benchmark 2: ./target/release/cat.match -n ./wikidatawiki-20240901-pages-logging27.xml
  Time (mean ± σ):      5.860 s ±  0.102 s    [User: 5.239 s, System: 0.616 s]
  Range (min … max):    5.722 s …  6.056 s    10 runs
 
Summary
  ./target/release/cat -n ./wikidatawiki-20240901-pages-logging27.xml ran
    1.08 ± 0.02 times faster than ./target/release/cat.match -n ./wikidatawiki-20240901-pages-logging27.xml

So it's a few% faster, worth the gain I think.

@tertsdiepraam
Copy link
Member

Very Cool!

@karlmcdowall
Copy link
Contributor Author

Very Cool!

Many thanks for the review!

@karlmcdowall karlmcdowall force-pushed the cat_line_number_formatting branch from 90828e6 to b6131f6 Compare April 4, 2025 16:42
@github-actions
Copy link

github-actions bot commented Apr 4, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

}

// Logic to store a string for the line number. Manually incrementing the value
// represented in a buffer like this is significantly faster than storing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very similar to the fast_inc function I have here: https://github.com/uutils/coreutils/pull/7564/files#diff-8ef9575cb5c2b35e15751308bf63d6dbf3ac217a383790f9355bc5e13ab2fdb9R272

I wonder if we should try to reuse that? (my function is a bit more generic as increment can be other values that aren't 1, so I'm not 100% sure how it'll do in terms of performance)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to leave my code as-is for now. I've done some benchmarking and even a small change in the logic I've written has a measurable difference on the overall performance, so I think moving to a generic function would slow things down quite a bit.
Do you think you'll ever implement a specialization for the common-case of adding exactly 1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, happy to play with this after both our PR are merged ,-)

I'm hoping that by forcing #[inline] the compiler will give us the specialized version for free.

Copy link
Collaborator

@drinkcat drinkcat Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I played with it anyway ,-P

Dirty commit here: 606655e gets within 1-2% of your implementation.

And 03a0481 gets 1% faster with an optimized version, but it's a bit unfortunate that we need to copy code (I can't see how we can avoid that). edit: bc689e5 ever more optimized...

I'll need to check what happens when I move the function to uucore (not sure yet to understand how rust does things w.r.t. compile/link...)

One difference is that I preallocate a larger buffer, so that the vector never needs to be grown (we'll never reach more than 1024 digits ,-P)

(again, I don't mean to block this PR, happy to look further after merge)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, for sure if there's a version in uucore that gives the same performance then let's use that 👍

Add a simple class to manually maintain a string representation
of the line number for the `cat` application.
Maintaing this string is much faster than converting a `usize`
line-number variable to a string each time it's needed.

Gives a significant performance improvement with -n and -b
flags.
@karlmcdowall karlmcdowall force-pushed the cat_line_number_formatting branch from b6131f6 to c56489e Compare April 5, 2025 01:50
@github-actions
Copy link

github-actions bot commented Apr 5, 2025

GNU testsuite comparison:

Congrats! The gnu test tests/misc/tee is no longer failing!

// If we hit anything other than a b'9' we can break since the next digit is
// unaffected.
// Also note that if we hit a b' ', we can think of that as a 0 and increment to b'1'.
// If/else here is faster than match (as measured with some benchmarking Apr-2025),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, the assembly is indeed a bit different in main here:
https://godbolt.org/z/fr6crT441

@sylvestre sylvestre merged commit f6cadac into uutils:main Apr 5, 2025
68 checks passed
@karlmcdowall karlmcdowall deleted the cat_line_number_formatting branch July 5, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants