Simple performance improvements for ldm#2464
Conversation
lib/compress/zstd_ldm.c
Outdated
| unsigned const offset = *pOffset; | ||
|
|
||
| *(ZSTD_ldm_getBucket(ldmState, hash, ldmParams) + offset) = entry; | ||
| *pOffset = (offset + 1) & (((U32)1 << ldmParams.bucketSizeLog) - 1); |
There was a problem hiding this comment.
nit: I believe the right hand side needs an explicit cast to appease the warning
There was a problem hiding this comment.
Thanks for the hint. This should be fixed by my latest commit.
|
Thanks for the improvements, this generally looks pretty good to me! The circle CI failure can probably get fixed with a rebase on top of the latest |
|
I've measured a 2.5% - 4.5% speed improvement on several different files, which is in line with the measurements you've provided. There are remaining calls to |
lib/compress/zstd_ldm.c
Outdated
| ldmEntry_t entry; | ||
|
|
||
| entry.offset = curr; | ||
| entry.checksum = checksum; | ||
| ZSTD_ldm_insertEntry(ldmState, hash, entry, *params); |
There was a problem hiding this comment.
We also insert when we find an entry right? Could we use ZSTD_ldm_insertEntry() in that case too?
There was a problem hiding this comment.
Yes, I thought it'd not bring much of a win but for the sake of consistency I updated it.
|
Thanks for your comments! I rebased on top of a recent dev branch and added one commit to address the feedback.
In |
terrelln
left a comment
There was a problem hiding this comment.
This looks great! I just have one minor nit about a renaming and it looks good to me.
lib/compress/zstd_ldm.c
Outdated
| U64 rollingHash = 0; | ||
|
|
||
| while (ip <= ilimit) { | ||
| U32 const currentOffset = (U32)(ip - base); |
There was a problem hiding this comment.
nit: this is not an offset it is an "index". Either keep curr or use currentIndex (note: current cannot be used because it conflicts with a macro in the Linux kernel).
I know that it is called offset in the ldmEntry_t struct, but that is a bad name. It should be renamed to index. If you want to include that renaming in this diff or a followup that would be great.
|
I renamed the variable to
|
terrelln
left a comment
There was a problem hiding this comment.
Awesome, thanks for the PR! Looking forward to the next one :)
|
Thanks @mpu ! I tested your PR on a (fairly stable) desktop-class workstation, |
Description
This PR contains some simple performance improvements to the ldm matcher, together with some other cosmetic changes that I made "en passant" (variable renames, typo fixes, ...).
The changes to the ldm matcher only affect performance and the new code should produce precisely the same output as the baseline. Two changes helped improve the performance relatively uniformly:
ZSTD_ldm_generateSequenceswe now compute a tag mask once before entering the loop and use it to check if the rolling hash satisfies the "tag condition". The tag mask is computed in such a way that the new tag condition is equivalent to the original one.ZSTD_ldm_insertEntryinstead ofZSTD_ldm_makeEntryAndInsertByTagwhich performs some redundant checks.Benchmarks
The following files were used to assess the impact of the change:
hhvm-rt.tar(5.9G) - an image of production version of HHVMl1m.tar(2.0G) - two snapshots of the linux kernel source code taken a month apartl1y.tar(1.9G) - two snapshots of the linux kernel source code taken a year apartl5.tar(544M) - the first 544M of an archive containing the linux kernel source codeThe results are compiled in the table below. All times are computed as the best over a set of 5 runs.
--long=27 -1--long=27 -1--long=27 -1--long=27 -1--long=30 -1--long=30 -1--long=30 -1--long=30 -1--long=27 -8--long=27 -8--long=27 -8--long=27 -8--long=30 -8--long=30 -8--long=30 -8--long=30 -8--long=27 -16--long=27 -16--long=27 -16--long=27 -16--long=30 -16--long=30 -16--long=30 -16--long=30 -16--long=27 -19--long=27 -19--long=27 -19--long=27 -19--long=30 -19--long=30 -19--long=30 -19--long=30 -19