[lazy] Skip over incompressible data#3552
Merged
terrelln merged 1 commit intofacebook:devfrom Mar 20, 2023
Merged
Conversation
acca036 to
6ddbf8d
Compare
Every 256 bytes the lazy match finders process without finding a match, they will increase their step size by 1. So for bytes [0, 256) they search every position, for bytes [256, 512) they search every other position, and so on. However, they currently still insert every position into their hash tables. This is different from fast & dfast, which only insert the positions they search. This PR changes that, so now after we've searched 2KB without finding any matches, at which point we'll only be searching one in 9 positions, we'll stop inserting every position, and only insert the positions we search. The exact cutoff of 2KB isn't terribly important, I've just selected a cutoff that is reasonably large, to minimize the impact on "normal" data. This PR only adds skipping to greedy, lazy, and lazy2, but does not touch btlazy2. | Dataset | Level | Compiler | CSize ∆ | Speed ∆ | |---------|-------|--------------|---------|---------| | Random | 5 | clang-14.0.6 | 0.0% | +704% | | Random | 5 | gcc-12.2.0 | 0.0% | +670% | | Random | 7 | clang-14.0.6 | 0.0% | +679% | | Random | 7 | gcc-12.2.0 | 0.0% | +657% | | Random | 12 | clang-14.0.6 | 0.0% | +1355% | | Random | 12 | gcc-12.2.0 | 0.0% | +1331% | | Silesia | 5 | clang-14.0.6 | +0.002% | +0.35% | | Silesia | 5 | gcc-12.2.0 | +0.002% | +2.45% | | Silesia | 7 | clang-14.0.6 | +0.001% | -1.40% | | Silesia | 7 | gcc-12.2.0 | +0.007% | +0.13% | | Silesia | 12 | clang-14.0.6 | +0.011% | +22.70% | | Silesia | 12 | gcc-12.2.0 | +0.011% | -6.68% | | Enwik8 | 5 | clang-14.0.6 | 0.0% | -1.02% | | Enwik8 | 5 | gcc-12.2.0 | 0.0% | +0.34% | | Enwik8 | 7 | clang-14.0.6 | 0.0% | -1.22% | | Enwik8 | 7 | gcc-12.2.0 | 0.0% | -0.72% | | Enwik8 | 12 | clang-14.0.6 | 0.0% | +26.19% | | Enwik8 | 12 | gcc-12.2.0 | 0.0% | -5.70% | The speed difference for clang at level 12 is real, but is probably caused by some sort of alignment or codegen issues. clang is significantly slower than gcc before this PR, but gets up to parity with it. I also measured the ratio difference for the HC match finder, and it looks basically the same as the row-based match finder. The speedup on random data looks similar. And performance is about neutral, without the big difference at level 12 for either clang or gcc.
Contributor
|
Overall looks good, I wonder if you have have a benchmark for compression ratio when we have a bunch of random data followed by compressible data? |
Cyan4973
approved these changes
Mar 16, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Every 256 bytes the lazy match finders process without finding a match, they will increase their step size by 1. So for bytes [0, 256) they search every position, for bytes [256, 512) they search every other position, and so on. However, they currently still insert every position into their hash tables. This is different from fast & dfast, which only insert the positions they search.
This PR changes that, so now after we've searched 2KB without finding any matches, at which point we'll only be searching one in 9 positions, we'll stop inserting every position, and only insert the positions we search. The exact cutoff of 2KB isn't terribly important, I've just selected a cutoff that is reasonably large, to minimize the impact on "normal" data.
This PR only adds skipping to greedy, lazy, and lazy2, but does not touch btlazy2.
The speed difference for clang at level 12 is real, but is probably caused by some sort of alignment or codegen issues. clang is significantly slower than gcc before this PR, but gets up to parity with it.
I also measured the ratio difference for the HC match finder, and it looks basically the same as the row-based match finder. The speedup on random data looks similar. And performance is about neutral, without the big difference at level 12 for either clang or gcc.
Fixes #3539.