Add determinism fuzzers and fix rare determinism bugs#2648
Merged
terrelln merged 4 commits intofacebook:devfrom May 15, 2021
Merged
Add determinism fuzzers and fix rare determinism bugs#2648terrelln merged 4 commits intofacebook:devfrom
terrelln merged 4 commits intofacebook:devfrom
Conversation
Contributor
Author
|
This is a non-trivial change. It isn't terribly complex, but it is not trivial. If we wanted to merge this into I don't think it is terribly critical to merge, since these are longstanding bugs. The only one that might impact us in practice is the dictionary invalidation bug, which might be losing us a few percent of speed. |
9d41f0e to
725c5e4
Compare
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of very long matches. https://github.com/facebook/zstd/blob/40def70387f99b239f3f566ba399275a8fd44cde/lib/compress/zstd_opt.c#L476 This optimization is based off the length longest match found. However, when indices are reset, we only ensure that we can reference the whole window starting from `ip`. If the previous block ended with a long match then `nextToUpdate` could be much less than `ip`. It might be far enough back that `nextToUpdate < maxDist`, so it doesn't have a full window of data to reference. This can cause non-determinism bugs, because we may find a match that is beyond `ip - maxDist`, and may sometimes be un-referencable, and that match triggers the speed optimization. The fix is to base the `windowLow` off of the `target` of `ZSTD_updateTree_internal()`, because anything below that value will be obsolete by the time `ZSTD_updateTree_internal()` completes.
The repcode checks disallowed repcodes that are equal to `windowLow`. This is slightly inefficient, but isn't a problem on its own. Together with the next commit, it cause non-determinism.
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the block. This ensures that `lowLimit` is updated to `dictLimit` whenever the ext-dict is out of range, so we can use prefix mode for speed. This can cause non-determinism because prefix mode and ext-dict mode match finders can return different results. It can also hurt speed because ext-dict match finders are slower. The scenario is: 1. Compress large data with a dictionary. 2. The dictionary goes out of bounds, so we invalidate it. 3. However, we still have `lowLimit < dictLimit`, since it is never updated. 4. We will call the ext-dict match finder instead of the prefix one.
Compress the input twice in the `simple_round_trip` and `dictionary_round_trip` fuzzers with exactly the same parameters, but reusing the context. Then ensure that the compressed output is identical.
Contributor
|
A few percent of speed, for a rare corner case, doesn't feel critical. |
Cyan4973
approved these changes
May 14, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See each commit message for details.
lowLimitwhen it is too low for the beginning of the blocksimple_round_tripanddictionary_round_trip